Hardware IP Cores
of Advanced Encryption Standard
AES-Rijndael

 

The Advanced Encryption Standard (AES-Rijndael) cores for ASICs and FPGAs are available for licensing from George Mason University. The cores are based on a fully synthesizable RTL VHDL code and the GMU developers have optimized the cores for ASICs and FPGAs respectively, using techniques and figures of merit specific for these technologies. The code is fully compatible with the most recent version of the draft Federal Information Processing Standard (FIPS) published by NIST.

 

Application Features Versions Choice and
Customization
Performance
Comparison Testing
and tools
GMU is ready
to Deliver to
YOU
Cores for other
ciphers
Further information

 

Application

The GMU-AES design can be used in any application that requires protection of data during transmission through the communication network, including applications such as electronic commerce transactions, ATM machines, wireless communication, Virtual Private Networks (VPN), and many others. Our AES cores can be used as a part of the hardware or hybrid implementation of all major security protocols, including IPSec, SSL, IEEE 802.11a, and the ATM Forum Security Specification.

 

Features

  • Support for three different key sizes specified in the NIST FIPS standard: 128, 192, and 256 bits.

Changing from one key size to the other does not incur any speed penalty.

  • Encryption and decryption in the same device.

Switching from encryption to decryption does not incur any speed penalty.

  • CBC and ECB modes of operation implemented by default.

Easily extendable to other modes, such as CFB, OFB, OCB, and counter mode.

  • Key scheduling performed in parallel with encryption and decryption.

Round keys corresponding to the new session keys can be computed in the background, simultaneously with processing of data from previous sessions. No speed penalty for key agility.

  • Very high encryption and decryption throughputs.

In Xilinx Virtex E devices, encryption and decryption throughputs in excess of 16 Gbit/s can be achieved, the highest AES throughputs reported in the literature to date.

  • Efficient use of the circuit area.

Minimization of the circuit area using techniques such as resource sharing and inner-round pipelining. The best throughput to area ratio reported in the literature.

  • Simple and universal external interface (shown in the figure below).

The same interface was used in our implementations of Triple DES and all final AES candidates: Mars, RC6, Twofish, and Serpent.

 

Versions

Three basic versions of the hardware AES-Rijndael core have been developed.

  1. Basic iterative architecture - ideal for applications with the modest speed requirements (< 500 Mbit/s for FPGA implementations), and limitations on the circuit area and power consumption. It operates equally well in all cipher modes.
  2. This architecture executes one cipher round per each master clock period. Only one block of data is processed at a time. The number of clock cycles necessary to encrypt or decrypt a block of data is equal to the number of cipher rounds.

  3. Inner-round pipelined architecture - ideal for applications with relatively high speed requirements (~ 1 Gbit/s for FPGA implementations) and limitations on the circuit area and cost.
  4. An extension of the basic iterative architecture, in which additional pipelined registers are introduced inside of the cipher round. As a result, two blocks of data can be processed simultaneously at the increased clock frequency. The encryption and decryption throughputs almost double compared to the basic architecture, at the cost of only a small increase in the circuit area. To exploit all the features of this architecture in feedback cipher modes, such as CBC, CFB, or OFB, two independent streams of data (two different messages or packets) must be available for encryption at any point in time.

  5. Fully pipelined architecture - ideal for applications with very high encryption and decryption throughputs (> 10 Gbit/s), and relaxed requirements on the circuit area, cost, or power consumption.
  6. In this architecture, all rounds are unrolled and registers introduced between any two consecutive rounds, and inside the cipher rounds. The characteristic feature of this architecture is that in every clock cycle, the encryption/decryption of one block of data is completed.

    To exploit all the features of this architecture in feedback cipher modes, such as CBC, CFB, or OFB, multiple independent streams of data must be available for encryption at any point in time.

 

Choice and Customization

Based on the features of the three preselected versions of the AES-Rijndael architecture, the user can choose the architecture that best matches the requirements of the intended application. These versions can be further customized to meet specific requirements of a given application.

In particular, the following features can be added or modified:

  • modes of operation of a block cipher (e.g., counter mode, OCB mode),

  • number of internal pipeline stages,

  • separation between encryption and decryption (e.g., an encryption only or decryption only unit).

If none of the existing architectures offers a good match with the  throughput/area/power/functionality requirements of the end application, a fully specialized architecture can be developed based on the user's specification.

 

Performance

Performance characteristics of our three architectures of AES-Rijndael, implemented using Xilinx Virtex XCV-1000-6 are given below:

 

Basic iterative architecture

Inner-round pipelined architecture

Fully pipelined architecture*

Maximum master clock frequency

47 MHz

80 MHz

95 MHz

Encryption/decryption throughput (128-bit key)

521 Mbit/s

888 Mbit/s

11.3 Gbit/s

Area [CLB slices + Block RAMs]

1,228 CLB slices,
18 BlockRAMs

2,398 CLB slices,
18 BlockRAMs

12,600 CLB slices,
80 BlockRAMs

Area [percentage of the target device resources]

10% of CLBs,
56% of BlockRAMs

19% of CLBs,
56% of BlockRAMs

103% of CLBs,
250% of BlockRAMs**

*    No key scheduling, 128-bit key, 10 round version of the design.
** Three XCV-1000 devices are necessary to implement this circuit.

 

Performance characteristics of our two architectures of AES-Rijndael, implemented using Virtex E family of Xilinx FPGA devices are given below:

 

 

Basic iterative architecture

Fully pipelined architecture*

Target FPGA device

Virtex 300E-8

Virtex 1000E-8

Maximum master clock frequency

67 MHz

134.5 MHz

Encryption/decryption throughput (128-bit key)

743 Mbit/s

 16.0 Gbit/s

Area [CLB slices + Block RAMs]

986 CLB slices,
18 BlockRAMs

9,199 CLB slices,
80 BlockRAMs

Area [percentage of the target device resources]

32% of CLB slices,
56% of BlockRAMs

74% of CLB slices,
83% of BlockRAMs

 *   No key scheduling, 128-bit key, 10 round version of the design.

Comparison with results of other groups

Comparing hardware designs (quality of VHDL codes) makes sense only if all compared designs are implemented using the same family of FPGA devices or the same ASIC technology.

 The GMU design implementations using Xilinx families of FPGA devices outperform designs of any other group reported in the literature to date.

In particular,

  • for the basic iterative architecture implemented using Xilinx Virtex 1000-6, compared to the next best result reported in the literature by a group from the University of Southern California, the GMU design encryption/decryption throughput is better by over 60%.

  • for the fully pipelined architecture implemented using Xilinx Virtex E family of devices, compared to the next best result reported in the literature by a group from the Queen's University of Belfast, the GMU design throughput is better by a factor of 2.3 for encryption and 5.0 for decryption.

 

Testing and tools

The GMU AES cores have been thoroughly verified using a combination of simulation and experimental testing. First, functional verification was performed using Aldec Active-HDL and the Monte Carlo test. Second, the designs were processed using Xilinx tools for logic synthesis, mapping, placing and routing. These tools generated reports describing the area of implementations, a netlist used for timing simulations, and a bitstream used to configure actual FPGA devices. The maximum clock frequency was obtained using static timing analysis and confirmed using timing simulation. Finally, selected designs were tested experimentally using the SLAAC-1V FPGA accelerator board developed by the University of Southern California Information Sciences Institute (shown in the photograph below).

 

 

 

GMU is ready to Deliver to you

    • Fully synthesizable RTL VHDL code or FPGA target specific netlist
       

    • VHDL testbench
       

    • Description of tests, and results of static timing analysis and experimental testing for selected target FPGA devices
       

    • User documentation with a full description of the external interface.

 

Cores for other ciphers

Hardware cores for the following other symmetric-key block ciphers have been developed at George Mason University, and can be prepared for licensing by interested parties:

    • DES

    • Triple DES

    • Mars

    • RC6

    • Twofish

    • Serpent.

All cores offer similar interface and have been optimized for use with both FPGA and ASIC devices.

 

Related Publications

  1. P. Chodowiec, K. Gaj, P. Bellows, and B. Schott, "Experimental Testing of the Gigabit IPSec-Compliant Implementations of Rijndael and Triple DES Using SLAAC-1V FPGA Accelerator Board," Proc. Information Security Conference, Malaga, Spain, October 1-3, 2001 (in print) © Springer-Verlag

  2. K. Gaj and P. Chodowiec, "Fast implementation and fair comparison of the final candidates for Advanced Encryption Standard using Field Programmable Gate Arrays," Proc. RSA Security Conf. - Cryptographer's Track, San Francisco, CA, April 8-12, 2001, © Springer-Verlag

  3. P. Chodowiec, P. Khuon, and K. Gaj, "Fast Implementations of Secret-Key Block Ciphers Using Mixed Inner- and Outer-Round Pipelining" ACM/SIGDA Ninth International Symposium on Field Programmable Gate Arrays, Monterey, CA, February, 11-13, 2001

  4. K. Gaj and P. Chodowiec, "Hardware performance of the AES finalists - survey and analysis of results," Technical Report, George Mason University, September 2000

  5. K. Gaj and P. Chodowiec, "Comparison of the hardware performance of the AES candidates using reconfigurable hardware," Third Advanced Encryption Standard (AES) Candidate Conference, New York, April 13-14, 2000

  • P. Chodowiec and K. Gaj, "Implementations of the Twofish Cipher Using FPGA Devices," Technical Report, George Mason University, July 1999

  • All these papers and additional viewgraph presentations are available at http://ece.gmu.edu/crypto/publications.htm

     

    For further information
    regarding licensing please, contact

    Jennifer Murphy
    Director of Intellectual Property and Technology Transfer @ George Mason University
    e-mail: jmurphy@gmu.edu
    phone: +1 703 993 2985
    George Mason University
    4400 University Drive
    Fairfax, VA 22030
    U.S.A.

    regarding technical specification and customization of the code, please contact

    Dr. Kris Gaj
    Cryptography and Network Security Implementations Lab
    e-mail: kgaj@gmu.edu
    phone: +1 703 993 1575
    fax: +1 703 993 1601
    Electrical and Computer Engineering
    George Mason University
    4400 University Drive
    Fairfax, VA 22030
    U.S.A.