Difference between revisions of "SHA-3 Hardware Implementations"

From The ECRYPT Hash Function Website
m (References: Added reference for Mikami et al.)
m (High-Speed Implementations (FPGA): Added results from Mabrouk and Benadjila)
Line 81: Line 81:
 
|-
 
|-
 
| ECHO-224/256  || [http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf Kinsy and Uhler] [[#Ref021|[21]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 273 cycles per block  || Altera Cyclone II  || align="right"| 39091 LEs  || align="right"| 397 Mbit/s(*)  || align="right"| 70.6 MHz
 
| ECHO-224/256  || [http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf Kinsy and Uhler] [[#Ref021|[21]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 273 cycles per block  || Altera Cyclone II  || align="right"| 39091 LEs  || align="right"| 397 Mbit/s(*)  || align="right"| 70.6 MHz
 
 
|-
 
|-
 
| ECHO-256  || [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]] / [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || 4 x 2 AES round instances with pipeline register in BigSubWords  || Xilinx Virtex 5  || align="right"| 15006 slices  || align="right"| 23860 Mbit/s  || align="right"| 139 MHz
 
| ECHO-256  || [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]] / [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || 4 x 2 AES round instances with pipeline register in BigSubWords  || Xilinx Virtex 5  || align="right"| 15006 slices  || align="right"| 23860 Mbit/s  || align="right"| 139 MHz
 +
|-
 +
| ECHO-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 3556 slices  || align="right"| 1614 Mbit/s  || align="right"| 104 MHz
  
 
|-
 
|-
| ECHO-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 3556 slices  || align="right"| 1614 Mbit/s  || align="right"| 104 MHz
+
| ECHO-256  || [http://crypto.rd.francetelecom.com/ECHO/hard/ Mabrouk and Benadjila] [[#Ref028|[28]]] / [http://crypto.rd.francetelecom.com/ECHO/hard/echo_highspeed_virtex5.zip Implementer's webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Fully parallel iterations of Compress512  || Xilinx Virtex 5  || align="right"| 10407 slices  || align="right"| 26390 Mbit/s  || align="right"| 154.6 MHz
 +
|-
 +
| ECHO-256  || [http://crypto.rd.francetelecom.com/ECHO/hard/ Mabrouk and Benadjila] [[#Ref028|[28]]] / [http://crypto.rd.francetelecom.com/ECHO/hard/echo_highspeed_virtex6.zip Implementer's webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Fully parallel iterations of Compress512  || Xilinx Virtex 6 || align="right"| 8071 slices  || align="right"| 29457 Mbit/s  || align="right"| 172.6 MHz
 +
 
 
|-
 
|-
 
| ECHO-384/512  || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] [[#Ref005|[5]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 9097 slices  || align="right"| 7810 Mbit/s  || align="right"| 83.9 MHz
 
| ECHO-384/512  || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] [[#Ref005|[5]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 9097 slices  || align="right"| 7810 Mbit/s  || align="right"| 83.9 MHz

Revision as of 14:58, 27 August 2010

1 Call for Contributions

Implementers (both submitters and non-submitters): You have results that complement this site? Let us know at sha3zoo-hardware@iaik.tugraz.at If you are making your HDL code available, please also provide us with according information.

2 Important Information

This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST. This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our call for contributions.

A list of hardware implementations of the round 1 candidates can be found here. Please note that the page for round 1 candidates is provided for reference and will not be updated.

The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct comparisions between different hardware implementation difficult. The more of these parameters agree, the more reasonable the comparison becomes.

The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology).

In order to facilitate the comparision of hardware modules with different implementation scopes, we classify them into three categories:

For suggestions regarding the structure of this site, let us know at sha3zoo-hardware@iaik.tugraz.at

2.1 Fully Autonomous Implementation

HW type self-cont.jpg

Such hardware implementations include the complete functionality of a SHA-3 candidate (or a specific version thereof). That means the input message can be loaded piecewise into the hardware module and it delivers the message digest as output. All hash calculations happen exclusively within the hardware module. If integrated in a system, the achievable throughput of a fully autonomous implementation depends on the speed of the hardware module itself and the speed of the (system dependent) data interface delivering the input message.


2.2 Implementation with External Memory

HW type ext-mem.jpg

These implementations use external memory to hold intermediate values during the hashing of a message. The implemented hardware itself normally consists of the core logic functionality of the hash function, some registers for short-lived temporary values, and possible a memory controller for access to the external memory. Such implementations can load the input message either over a dedicated interface (similar to a fully autonomous implementation) or from the external memory. In order to reach the maximal throughput of the hardware module, the external memory must be sufficiently fast.


2.3 Implementation of Core Functionality

HW type core-funct.jpg

Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations.

3 Summary of All Results

This section includes four categories of implementations (high-speed, low-area, both for FPGA and ASIC) which include known published results. If the HDL sourcecode is available, a link is provided as well.

3.1 High-Speed Implementations (FPGA)

Important note: The size and functionality of slices varies between FPGA families. A direct comparision of the slice count of implementations on different FPGA families is therefore problematic.

Hash Function Name Reference / HDL Impl. Scope Impl. Details Technology Size Throughput Clock Frequency
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex-II Pro 3091 slices 1724 Mbit/s 37.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 4 3087 slices 2235 Mbit/s 48.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 5 1694 slices 3103 Mbit/s 67.0 MHz
BLAKE-32 Namin and Hasan [2] / N/A Core functionality Compression function with 8 G function units and I/O registers Altera Stratix III 5435 ALUTs 2186.2 Mbit/s 46.97 MHz
BLAKE-32 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 1660 slices 2676 Mbit/s 115 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex-II Pro 11122 slices 1177 Mbit/s 17.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 4 11483 slices 1707 Mbit/s 25.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 5 4329 slices 2389 Mbit/s 35.0 MHz
Blue Midnight Wish-256 Namin and Hasan [2] / N/A Core functionality Compression function with f0, f1, and f2 unrolled in sequence and I/O registers Altera Stratix III 12917 ALUTs 4889.6 Mbit/s 9.55 MHz
CubeHash8/1-256(***) Baldwin et al. [4] / N/A Core functionality 2 compression functions unrolled Xilinx Spartan 3 3268 slices 70 Mbit/s 37.9 MHz
CubeHash8/1-256(***) Baldwin et al. [4] / N/A Core functionality 1 iterated compression function Xilinx Virtex 5 1178 slices 160 Mbit/s 166.8 MHz
CubeHash16/32-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 590 slices 2960 Mbit/s 185 MHz
ECHO-224/256 Lu et al. [5] / N/A Fully autonomous Xilinx Virtex 5 9333 slices 14860 Mbit/s 87.1 MHz
ECHO-224/256 Kinsy and Uhler [21] / N/A Fully autonomous 273 cycles per block Altera Cyclone II 39091 LEs 397 Mbit/s(*) 70.6 MHz
ECHO-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality 4 x 2 AES round instances with pipeline register in BigSubWords Xilinx Virtex 5 15006 slices 23860 Mbit/s 139 MHz
ECHO-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 3556 slices 1614 Mbit/s 104 MHz
ECHO-256 Mabrouk and Benadjila [28] / Implementer's webpage Fully autonomous Fully parallel iterations of Compress512 Xilinx Virtex 5 10407 slices 26390 Mbit/s 154.6 MHz
ECHO-256 Mabrouk and Benadjila [28] / Implementer's webpage Fully autonomous Fully parallel iterations of Compress512 Xilinx Virtex 6 8071 slices 29457 Mbit/s 172.6 MHz
ECHO-384/512 Lu et al. [5] / N/A Fully autonomous Xilinx Virtex 5 9097 slices 7810 Mbit/s 83.9 MHz
ECHO-384/512 Kinsy and Uhler [21] / N/A Fully autonomous 341 cycles per block Altera Cyclone II 39091 LEs 212 Mbit/s(**) 70.6 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous P & Q permutation in parallel Xilinx Spartan 3 6136 slices 4520 Mbit/s 88.3 MHz
Grøstl-224/256 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Virtex 5 1722 slices 10276 Mbit/s 200.7 MHz
Grøstl-224/256 Baldwin et al. [4] / N/A Core functionality P & Q permutation in parallel, S-box in BRAM Xilinx Spartan 3 4827 slices 3660 Mbit/s 71.53 MHz
Grøstl-224/256 Baldwin et al. [4] / N/A Core functionality P & Q permutation in parallel, S-box in BRAM Xilinx Virtex 5 4516 slices 7310 Mbit/s 142.87 MHz
Grøstl-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 4057 slices 5171 Mbit/s 101 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Spartan 3 20233 slices 5901 Mbit/s 80.7 MHz
Grøstl-384/512 Baldwin et al. [4] / N/A Core functionality P & Q permutation parallel, S-box in LUTs Xilinx Spartan 3 17452 slices 3180 Mbit/s 79.61 MHz
Grøstl-384/512 Baldwin et al. [4] / N/A Core functionality P & Q permutation parallel, S-box in LUTs Xilinx Virtex 5 19161 slices 6090 Mbit/s 83.33 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Virtex 5 5419 slices 15395 Mbit/s 210.5 MHz
Grøstl-384/512 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation Xilinx Spartan 3 8308 slices 3474 Mbit/s 95 MHz
Hamsi-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 718 slices 1680 Mbit/s 210 MHz
Hamsi-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality Three P and Pf instances each in pipeline Xilinx Virtex 5 4664 slices 6620 Mbit/s 207 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Altera Cyclone III 5776 LEs 7500 Mbit/s 133 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Altera Stratix III 4713 ALUTs 12400 Mbit/s 218 MHz
Keccak J. Strömbergson [9] / Submission webpage Fully autonomous Core (round function, state register) only Xilinx Spartan 3A 3393 slices 4800 Mbit/s 85 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Xilinx Virtex 5 1412 slices 6900 Mbit/s 122 MHz
Luffa-256 Namin and Hasan [2] / N/A Core functionality Compression function (1 cycle latency) and I/O registers Altera Stratix III 16552 ALUTs 12042.2 Mbit/s 47.04 MHz
Luffa-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 1048 slices 6343 Mbit/s 223 MHz
Luffa-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality Xilinx Virtex 5 9611 slices 12290 Mbit/s 48.2 MHz
Shabal Feron and Francq [10] / N/A Fully autonomous 36 adders in permutation Xilinx Virtex 5 1171 slices 2588 Mbit/s 126 MHz
Shabal Francq and Thuillet [26] / Shabal webpage Fully autonomous 4 iterations of the permutation unrolled Xilinx Virtex 5 1715 slices 3242 Mbit/s 76 MHz
Shabal Baldwin et al. [4] / N/A Core functionality 36 adders in permutation Xilinx Spartan 3 2223 slices 740 Mbit/s 71.48 MHz
Shabal Baldwin et al. [4] / N/A Core functionality 36 adders in permutation Xilinx Virtex 5 2768 slices 1450 Mbit/s 138.87 MHz
Shabal-256 Namin and Hasan [2] / N/A Core functionality Compression function with I/O registers (latency of 16 clock cycles) Altera Stratix III 1440 ALUTs 3125.6 Mbit/s 195.35 MHz
Shabal-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 1251 slices 1739 Mbit/s 214 MHz
Shabal-512 Detrey et al. [23] / INRIA webpage (see SCM tree) Fully autonomous Exploiting SRL16 primitive Xilinx Virtex 5 153 slices 2051 Mbit/s 256 MHz
Shabal-512 Detrey et al. [23] / INRIA webpage (see SCM tree) Fully autonomous Exploiting SRL16 primitive Xilinx Spartan 3 499 slices 800 Mbit/s 100 MHz
Skein-256-h Men Long [11] / N/A Core functionality UBI component Xilinx Virtex 5 1001 slices 408.7 Mbit/s 114.9 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Virtex 5 937 slices 1751 Mbit/s 68.4 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Spartan 3 2421 slices 669 Mbit/s 26.14 MHz
Skein-256-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 854 slices 1482 Mbit/s 115 MHz
Skein-512-h Men Long [11] / N/A Core functionality UBI component Xilinx Virtex 5 1877 slices 817.4 Mbit/s 114.9 MHz
Skein-512-512 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Virtex 5 1632 slices 3535 Mbit/s 69.04 MHz
Skein-512-512 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Spartan 3 4273 slices 1365 Mbit/s 26.66 MHz

(*) Estimated peak throughput ignoring I/O bottleneck resulting from specific interface: (1536 bits/block) * (70.6 * 10^6 cycles/s) / (273 cycles/block) = 397.22 * 10^6 bits/s.
(**) Estimated peak throughput ignoring I/O bottleneck resulting from specific interface: (1024 bits/block) * (70.6 * 10^6 cycles/s) / (341 cycles/block) = 212.01 * 10^6 bits/s.
(***) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.



3.2 Low-Area Implementations (FPGA)

Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-32 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Spartan-3 124 slices 115 Mbit/s 190.0 MHz
BLAKE-32 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-4 124 slices 216 Mbit/s 357.0 MHz
BLAKE-32 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-5 56 slices 225 Mbit/s 372.0 MHz
BLAKE-32 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Altera Cyclone III 285 LEs 116 Mbit/s 192.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex-II Pro 958 slices 371 Mbit/s 59.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 4 960 slices 430 Mbit/s 68.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 5 390 slices 575 Mbit/s 91.0 MHz
BLAKE-64 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Spartan-3 229 slices 138 Mbit/s 158.0 MHz
BLAKE-64 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-4 230 slices 219 Mbit/s 250.0 MHz
BLAKE-64 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-5 108 slices 314 Mbit/s 358.0 MHz
BLAKE-64 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Altera Cyclone III 542 LEs 123 Mbit/s 140.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex-II Pro 1802 slices 326 Mbit/s 36.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 4 1856 slices 381 Mbit/s 42.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 5 939 slices 533 Mbit/s 59.0 MHz
ECHO Beuchat et al. [24] / N/A Fully autonomous Adapted towards FPGA implementation (127 slices and 1 memory block) Xilinx Virtex 5 127 slices 72 Mbit/s 352.0 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous 64-bit datapath, P & Q permutation in parallel Xilinx Spartan 3 2486 slices 404 Mbit/s 63.2 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous 64-bit datapath, P & Q permutation in parallel Xilinx Virtex 2 Pro 2754 slices 512 Mbit/s 81.5 MHz
Grøstl-224/256 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation, S-Box based on composite field arithmetic Xilinx Spartan 3 1276 slices 192 Mbit/s 60 MHz
Grøstl-384/512 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation, S-Box based on composite field arithmetic Xilinx Spartan 3 2110 slices 144 Mbit/s 63 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Altera Stratix III 855 ALUTs 96.8 Mbit/s 366 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Altera Cyclone III 1559 LEs 47.8 Mbit/s 181 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Xilinx Virtex 5 444 slices 70.1 Mbit/s 265 MHz
Shabal Feron and Francq [10] / N/A Fully autonomous 36 adders in permutation Xilinx Virtex 5 596 slices (+ 40 DSP blocks) 1142 Mbit/s 109 MHz
Shabal Baldwin et al. [4] / N/A Core functionality 1 adder in permutation Xilinx Spartan 3 1933 slices 540 Mbit/s 89.71 MHz
Shabal Baldwin et al. [4] / N/A Core functionality 1 adder in permutation Xilinx Virtex 5 2307 slices 1330 Mbit/s 222.22 MHz
Shabal-512 Detrey et al. [23] / INRIA webpage (see SCM tree) Fully autonomous Exploiting SRL16 primitive Xilinx Virtex 5 153 slices 2051 Mbit/s 256 MHz
Shabal-512 Detrey et al. [23] / INRIA webpage (see SCM tree) Fully autonomous Exploiting SRL16 primitive Xilinx Spartan 3 499 slices 800 Mbit/s 100 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality One round of Threefish iterated Altera Stratix III 1385 ALUTs 573.9 Mbit/s 161.42 MHz



3.3 High-Speed Implementations (ASIC)

A comparison of implementations of all 14 round 2 candidates has been presented informally at IAIK (Graz University of Technology) on Sept. 16, 2009. The updated presentation slides can be found here.


Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units UMC 0.18 µm 58.30 kGates 5295 Mbit/s 114 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 4 G function units UMC 0.18 µm 41.31 kGates 4153 Mbit/s 170 MHz
BLAKE-32 Namin and Hasan [2] / N/A Core functionality Compression function with 8 G function units and I/O registers STM 90 nm 53 kGates 4475 Mbit/s(*) 96.15 MHz
BLAKE-32 Tillich et al. [14] / On request Fully autonomous Compression function with 4 G function units with CSAs UMC 0.18 µm 45.64 kGates 3971 Mbit/s 170.64 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units UMC 0.18 µm 132.47 kGates 5910 Mbit/s 87 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 4 G function units UMC 0.18 µm 82.73 kGates 4810 Mbit/s 136 MHz
Blue Midnight Wish-256 Namin and Hasan [2] / N/A Core functionality Compression function with f0, f1, and f2 unrolled in sequence and I/O registers STM 90 nm 164 kGates 26665 Mbit/s(*) 52.08 MHz
Blue Midnight Wish-256 Tillich et al. [14] / On request Fully autonomous Compression function with f0, f1, and f2 unrolled UMC 0.18 µm 169.74 kGates 5358 Mbit/s 10.46 MHz
CubeHash16/32-h Tillich et al. [14] / On request Fully autonomous Dynamically reconfigurable r and b parameters, two rounds unrolled UMC 0.18 µm 58.87 kGates 4665 Mbit/s 145.77 MHz
CubeHash16/32-h Bernet et al. [20] / N/A Fully autonomous One round per cycle 0.13 µm 34.33 kGates 9248 Mbit/s(***) 578 MHz
CubeHash16/32-h Bernet et al. [20] / N/A Fully autonomous Half a round per cycle 0.13 µm 21.54 kGates 8000 Mbit/s(***) 1000 MHz
ECHO-224/256 Lu et al. [5] / N/A Fully autonomous 0.13 µm 521.1 kGates 14850 Mbit/s 87.1 MHz
ECHO-256 Tillich et al. [14] / On request Fully autonomous Four parallel AES rounds, 16 AES MixColumns 32-bit column multipliers UMC 0.18 µm 141.49 kGates 2246 Mbit/s 141.84 MHz
ECHO-384/512 Lu et al. [5] / N/A Fully autonomous 0.13 µm 516.8 kGates 7750 Mbit/s 83.3 MHz
Fugue-256 Submission doc. [15] / N/A Fully autonomous Four columns of SMIX transformation in parallel (SUPER4_P) IBM 90 nm 109.85 kGates 13913 Mbit/s 869.5 MHz
Fugue-256 Tillich et al. [14] / On request Fully autonomous Four columns of SMIX transformation in parallel UMC 0.18 µm 46.26 kGates 4092 Mbit/s 255.75 MHz
Grøstl-256 Tillich et al. [14] / On request Fully autonomous One shared permutation for P & Q, one pipeline stage UMC 0.18 µm 58.40 kGates 6290 Mbit/s 270.27 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel UMC 0.18 µm 341 kGates 6225 Mbit/s 85.1 MHz
Hamsi-256 Junfeng Fan (Hamsi website) [16] / N/A Fully autonomous 0.13 µm 22 kGates 4940 Mbit/s 1080 MHz
Hamsi-256 Tillich et al. [14] / On request Fully autonomous Three instances of P/Pf function unrolled UMC 0.18 µm 58.66 kGates 5565 Mbit/s 173.91 MHz
Hamsi-512 Junfeng Fan (Hamsi website) [16] / N/A Fully autonomous 0.13 µm 50 kGates 3970 Mbit/s 820 MHz
JH-256 Tillich et al. [14] / On request Fully autonomous 320 S-boxes, one round of R8 per cycle UMC 0.18 µm 58.83 kGates 4991 Mbit/s 380.22 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer ST 0.13 µm 48 kGates 29900 Mbit/s 526 MHz
Keccak Submission doc. [8] / Submission webpage Fully autonomous Core (round function, state register) only ST 0.13 µm 40 kGates 15000 Mbit/s 500 MHz
Keccak(-256) Tillich et al. [14] / On request Fully autonomous One instance of Keccak-f round UMC 0.18 µm 56.32 kGates 21229 Mbit/s 487.80 MHz
Luffa-224/256 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) UMC 0.13 µm 30.83 kGates 31960 Mbit/s 1124 MHz
Luffa-256 Namin and Hasan [2] / N/A Core functionality Compression function (1 cycle latency) and I/O registers STM 90 nm 122 kGates 25702 Mbit/s(*) 100.4 MHz
Luffa-224/256 Tillich et al. [14] / On request Fully autonomous Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) UMC 0.18 µm 44.97 kGates 13741 Mbit/s 483.09 MHz
Luffa-384 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous Four permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) UMC 0.13 µm 50.07 kGates 23126 Mbit/s 813 MHz
Luffa-512 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous Five permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) UMC 0.13 µm 65.1 kGates 19617 Mbit/s 690 MHz
Shabal-256 Namin and Hasan [2] / N/A Core functionality Compression function with I/O registers (latency of 16 clock cycles) STM 90 nm 20 kGates 4408 Mbit/s(*) 413.22 MHz
Shabal-256 Tillich et al. [14] / On request Fully autonomous One word rotation per cycle, 50 cycles per block UMC 0.18 µm 54.19 kGates 3282 Mbit/s 320.51 MHz
Shabal Bernet et al. [20] / N/A Fully autonomous One word rotation per cycle, 52 cycles per block 0.13 µm 41.32 kGates 6351 Mbit/s(***) 645 MHz
SHAvite-3256 Tillich et al. [14] / On request Fully autonomous Four AES rounds (two for compression, two for message expansion) UMC 0.18 µm 57.39 kGates 3152 Mbit/s 227.79 MHz
SIMD-256(**) Tillich et al. [14] / On request Fully autonomous Two FFT-64 with two FFT-8 and 16 multipliers (8x8 bit) each UMC 0.18 µm 104.17 kGates 924 Mbit/s 64.93 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 53.87 kGates 1762 Mbit/s 68.8 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality All 72 Threefish rounds unrolled STM 90 nm 369 kGates 3126 Mbit/s(*) 12.21 MHz
Skein-256-256 Tillich et al. [14] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 58.61 kGates 1882 Mbit/s 73.52 MHz
Skein-512-512 Tillich et al. [14] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 102.04 kGates 2502 Mbit/s 48.87 MHz

(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.
(**) Implementation of round-one variant.
(***) Estimated peak throughput: Throughput for CubeHash8/1-h implementation * 16.



3.4 Low-Area Implementations (ASIC)

Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-32 Tillich et al. [18] / N/A Fully autonomous One G function in 11 cycles AMS 0.35 µm 25.57 kGates 15.4 Mbit/s 31.25 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with a single G function unit UMC 0.18 µm 10.54 kGates 253 Mbit/s 40 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with a half G function unit UMC 0.18 µm 9.89 kGates 127 Mbit/s 40 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with a single G function unit UMC 0.18 µm 20.61 kGates 181 Mbit/s 20 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with a half G function unit UMC 0.18 µm 19.46 kGates 91 Mbit/s 20 MHz
CubeHash16/32-h Bernet et al. [20] / N/A Fully autonomous Process two 32-bit words per cycle, 64 cycles per round 0.13 µm 7.63 kGates 32 Mbit/s(****) 100 MHz
ECHO-224/256 Lu et al. [5] / N/A Fully autonomous 0.13 µm 82.8 kGates 373 Mbit/s 66.6 MHz
Fugue-256 Submission doc. [15] / N/A Fully autonomous One SMIX transformation (SUPER1_L) IBM 90 nm 59.22 kGates 2000 Mbit/s 500 MHz
Grøstl-224/256 Tillich et al. [18] / N/A Fully autonomous 64-bit datapath, P & Q permutation shared AMS 0.35 µm 14.62 kGates 145.9 Mbit/s 55.87 MHz
Grøstl-224/256 Grøstl website [19] / N/A Fully autonomous 64-bit datapath, P & Q permutation shared UMC 0.18 µm 17 kGates 645 Mbit/s 246.9 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory ST 0.13 µm 6.5 kGates 176.4 Mbit/s(*) 666.7 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory, clock freq. limited to 200 MHz ST 0.13 µm 5 kGates 52.9 Mbit/s(**) 200 MHz
Luffa-224/256 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous One permutation block (64 S-boxes, 4 MixWord blocks) UMC 0.13 µm 18.26 kGates 2461 Mbit/s 250 MHz
Luffa-256 Mikami et al. [27] / N/A Fully autonomous One permutation block (64 S-boxes, 4 MixWord blocks) UMC 0.13 µm 10.34 kGates 538 Mbit/s 806 MHz
Luffa-384 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous 6 S-boxes, 1 MixWord TSMC 90 nm 27.13 kGates 1882 Mbit/s 250 MHz
Luffa-512 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous One permutation block (64 S-boxes, 4 MixWord blocks) UMC 0.13 µm 37.35 kGates 1524 Mbit/s 250 MHz
Shabal Bernet et al. [20] / N/A Fully autonomous One adder, one subtractor, one incrementer. 165 cycles per block 0.13 µm 23.32 kGates 310 Mbit/s 100 MHz
Skein-256-256 Tillich et al. [18] / N/A Fully autonomous 64-bit datapath AMS 0.35 µm 12.89 kGates 19.8 Mbit/s 80 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality One round of Threefish iterated STM 90 nm 21 kGates 1018.8 Mbit/s(***) 286.53 MHz

(*) Estimation for 64-bit memory interface: (1024 bits/permutation) * (666.7 * 10^6 cycles/s) / (3870 cycles/permutation) = 176.41 * 10^6 bits/s
(**) Estimation for 64-bit memory interface: (1024 bits/permutation) * (200 * 10^6 cycles/s) / (3870 cycles/permutation) = 52.92 * 10^6 bits/s
(***) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s
(****) Estimated peak throughput: Throughput for CubeHash8/1-h implementation * 16.



4 Comparative Studies

This section summarizes the reported results of publications which examined more than one round-two candidate in a similar setup.

4.1 Blake, BMW, Luffa, Shabal, Skein

Reference HDL Category Impl. Scope Technology
Namin and Hasan [2] N/A High-speed FPGA Core functionality Altera Stratix III


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 Compression function with 8 G function units and I/O registers 5435 ALUTs 2186.2 Mbit/s 46.97 MHz
Blue Midnight Wish-256 Compression function with f0, f1, and f2 unrolled in sequence and I/O registers 12917 ALUTs 4889.6 Mbit/s 9.55 MHz
Luffa-256 Compression function (1 cycle latency) and I/O registers 16552 ALUTs 12042.2 Mbit/s 47.04 MHz
Shabal-256 Compression function with I/O registers (latency of 16 clock cycles) 1440 ALUTs 3125.6 Mbit/s 195.35 MHz
Skein-256-256 All 72 Threefish rounds unrolled (device too small) N/A N/A N/A




Reference HDL Category Impl. Scope Technology
Namin and Hasan [2] N/A High-speed ASIC Core functionality STM 90 nm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 Compression function with 8 G function units and I/O registers 53 kGates 4475 Mbit/s(*) 96.15 MHz
Blue Midnight Wish-256 Compression function with f0, f1, and f2 unrolled in sequence and I/O registers 164 kGates 26665 Mbit/s(*) 52.08 MHz
Luffa-256 Compression function (1 cycle latency) and I/O registers 122 kGates 25702 Mbit/s(*) 100.4 MHz
Shabal-256 Compression function with I/O registers (latency of 16 clock cycles) 20 kGates 4408 Mbit/s(*) 413.22 MHz
Skein-256-256 All 72 Threefish rounds unrolled 369 kGates 3126 Mbit/s(*) 12.21 MHz

(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.



4.2 Blake, CubeHash, ECHO, Grøstl, Hamsi, Luffa, Shabal, Skein

Reference HDL Category Impl. Scope Technology
Kobayashi et al. [3] RCIS webpage High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 1660 slices 2676 Mbit/s 115 MHz
CubeHash16/32-256 590 slices 2960 Mbit/s 185 MHz
ECHO-256 3556 slices 1614 Mbit/s 104 MHz
Grøstl-256 4057 slices 5171 Mbit/s 101 MHz
Hamsi-256 718 slices 1680 Mbit/s 210 MHz
Luffa-256 1048 slices 6343 Mbit/s 223 MHz
Shabal-256 1251 slices 1739 Mbit/s 214 MHz
Skein-256 854 slices 1482 Mbit/s 115 MHz



4.3 CubeHash, Grøstl, Shabal

Reference HDL Category Impl. Scope Technology
Baldwin et al. [4] N/A High-speed FPGA Core functionality Xilinx Spartan 3


Hash Function Name Impl. Details Size Throughput Clock Frequency
CubeHash8/1-256(*) 2 compression functions unrolled 3268 slices 70 Mbit/s 37.9 MHz
Grøstl-224/256 P & Q permutation in parallel, S-box in BRAM 4827 slices 3660 Mbit/s 71.53 MHz
Grøstl-384/512 P & Q permutation parallel, S-box in LUTs 17452 slices 3180 Mbit/s 79.61 MHz
Shabal 36 adders in permutation 2223 slices 740 Mbit/s 71.48 MHz

(*) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.




Reference HDL Category Impl. Scope Technology
Baldwin et al. [4] N/A High-speed FPGA Core functionality Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
CubeHash8/1-256(*) 1 iterated compression function 1178 slices 160 Mbit/s 166.8 MHz
Grøstl-224/256 P & Q permutation in parallel, S-box in BRAM 4516 slices 7310 Mbit/s 142.87 MHz
Grøstl-384/512 P & Q permutation parallel, S-box in LUTs 19161 slices 6090 Mbit/s 83.33 MHz
Shabal 36 adders in permutation 2768 slices 1450 Mbit/s 138.87 MHz

(*) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.



4.4 All 14 Round-Two Candidates

An interactive graphical comparison of various area-performance tradeoffs of this study can be found here.


Reference HDL Category Impl. Scope Technology
Tillich et al. [14] On request High-speed ASIC Fully autonomous UMC 0.18 µm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 Compression function with 4 G function units with CSAs 45.64 kGates 3971 Mbit/s 170.64 MHz
Blue Midnight Wish-256 Compression function with f0, f1, and f2 unrolled 169.74 kGates 5358 Mbit/s 10.46 MHz
CubeHash16/32-h Dynamically reconfigurable r and b parameters, two rounds unrolled 58.87 kGates 4665 Mbit/s 145.77 MHz
ECHO-256 Four parallel AES rounds, 16 AES MixColumns 32-bit column multipliers 141.49 kGates 2246 Mbit/s 141.84 MHz
Fugue-256 Four columns of SMIX transformation in parallel 46.26 kGates 4092 Mbit/s 255.75 MHz
Grøstl-256 One shared permutation for P & Q, one pipeline stage 58.40 kGates 6290 Mbit/s 270.27 MHz
Hamsi-256 Three instances of P/Pf function unrolled 58.66 kGates 5565 Mbit/s 173.91 MHz
JH-256 320 S-boxes, one round of R8 per cycle 58.83 kGates 4991 Mbit/s 380.22 MHz
Keccak(-256) One instance of Keccak-f round 56.32 kGates 21229 Mbit/s 487.80 MHz
Luffa-224/256 Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) 44.97 kGates 13741 Mbit/s 483.09 MHz
Shabal-256 One word rotation per cycle, 50 cycles per block 54.19 kGates 3282 Mbit/s 320.51 MHz
SHAvite-3256 Four AES rounds (two for compression, two for message expansion) 57.39 kGates 3152 Mbit/s 227.79 MHz
SIMD-256(*) Two FFT-64 with two FFT-8 and 16 multipliers (8x8 bit) each 104.17 kGates 924 Mbit/s 64.93 MHz
Skein-256-256 8 Threefish rounds unrolled 58.61 kGates 1882 Mbit/s 73.52 MHz
Skein-512-512 8 Threefish rounds unrolled 102.04 kGates 2502 Mbit/s 48.87 MHz

(*) Implementation of round-one variant.



4.5 BLAKE, Grøstl, Skein

Reference HDL Category Impl. Scope Technology
Tillich et al. [18] N/A Low-area ASIC Fully autonomous AMS 0.35 µm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 One G function in 11 cycles 25.57 kGates 15.4 Mbit/s 31.25 MHz
Grøstl-224/256 64-bit datapath, P & Q permutation shared 14.62 kGates 145.9 Mbit/s 55.87 MHz
Skein-256-256 64-bit datapath 12.89 kGates 19.8 Mbit/s 80 MHz


4.6 ECHO, Hamsi, Luffa

Reference HDL Category Impl. Scope Technology
Ramakers and Narinx [25] Hosted by SHA-3 zoo High-speed FPGA Core functionality Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
ECHO-256 4 x 2 AES round instances with pipeline register in BigSubWords 15006 slices 23860 Mbit/s 139 MHz
Hamsi-256 Three P and Pf instances each in pipeline 4664 slices 6620 Mbit/s 207 MHz
Luffa-256 9611 slices 12290 Mbit/s 48.2 MHz

5 References

[1] Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan. SHA-3 proposal BLAKE (version 1.3). Available online at http://131002.net/blake/blake.pdf.

[2] A. H. Namin and M. A. Hasan. Hardware Implementation of the Compression Function for Selected SHA-3 Candidates. Available online at http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html.

[3] Kazuyuki Kobayashi, Jun Ikegami, Shin'ichiro Matsuo, Kazuo Sakiyama, and Kazuo Ohta. Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII. IACR Eprint report 2010/010. Available online at http://eprint.iacr.org/2010/010.pdf.

[4] Brian Baldwin, Andrew Byrne, Mark Hamilton, Neil Hanley, Robert P. McEvoy, Weibo Pan, and William P. Marnane. FPGA Implementations of SHA-3 Candidates: CubeHash, Grøstl, LANE, Shabal and Spectral Hash. IACR Eprint report 2009/342. Available online at http://eprint.iacr.org/2009/342.pdf.

[5] Liang Lu, Maire O'Neil, and Earl Swartzlander. Hardware Evaluation of SHA-3 Hash Function Candidate ECHO. Presentation at the Clauce Shannon Institute Workshop on Coding and Cryptography 2009. Slides available online at http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf.

[6] Bernhard Jungk, Steffen Reith, and Jürgen Apfelbeck. On Optimized FPGA Implementations of the SHA-3 Candidate Grøstl. IACR Eprint report 2009/206. Available online at http://eprint.iacr.org/2009/206.pdf.

[7] Praveen Gauravaram, Lars R. Knudsen, Krystian Matusievicz, Florian Mendel, Christian Rechberger, Martin Schläffer, and Søren S. Thomsen. Grøstl - a SHA-3 candidate (October 31, 2008). Available online at http://www.groestl.info/Groestl.pdf.

[8] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles van Assche. KECCAK sponge function family main document (Version 1.2, April 23, 2009). Available online at http://keccak.noekeon.org/Keccak-main-1.2.pdf.

[9] Joachim Strömbergson. Implementation of the Keccak Hash Function in FPGA Devices. Available online at http://www.strombergson.com/files/Keccak_in_FPGAs.pdf.

[10] Romain Feron and Julien Francq. FPGA Implementation of Shabal: Our First Results (Version 2.0, February 19, 2010). Available online at http://www.shabal.com/wp-content/uploads/2010/03/FPGA-Implementation-of-Shabal-First-ResultsV2.0.pdf.

[11] Men Long. Implementing Skein Hash Function on Xilinx Virtex-5 FPGA Platform (Version 0.7, February 2, 2009). Available online at http://www.skein-hash.info/sites/default/files/skein_fpga.pdf.

[12] Stefan Tillich. Hardware Implementation of the SHA-3 Candidate Skein. IACR Eprint report 2009/159. Available online at http://eprint.iacr.org/2009/159.pdf.

[13] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. Compact Implementations of BLAKE-32 and BLAKE-64 on FPGA. IACR Eprint report 2010/173. Available online at http://eprint.iacr.org/2010/173.pdf.

[14] Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, Jörn-Marc Schmidt, and Alexander Szekely. High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein. IACR Eprint report 2009/510. Available online at http://eprint.iacr.org/2009/510.pdf.

[15] Shai Halevi, William E. Hall, and Charanjit S. Jutla. The Hash Function Fugue (October 30, 2008). Available online at http://domino.research.ibm.com/comm/research_projects.nsf/pages/fugue.index.html/$FILE/NIST-submission-Oct08-fugue.pdf.

[16] Junfeng Fan. Hardware Evaluation of The Hash Function Hamsi. Available online at http://homes.esat.kuleuven.be/~okucuk/hamsi/implementations.html.

[17] Miroslav Knezevic and Ingrid Verbeiwhede. Hardware Evaluation of the Luffa Hash Family. 4th Workshop on Embedded Systems Security 2009. Available online at http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf.

[18] Stefan Tillich, Martin Feldhofer, Wolfgang Issovits, Thomas Kern, Hermann Kureck, Michael Mühlberghuber, Georg Neubauer, Andreas Reiter, Armin Köfler, and Mathias Mayrhofer. Compact Hardware Implementations of the SHA-3 Candidates ARIRANG, BLAKE, Grøstl, and Skein. IACR Eprint report 2009/349. Available online at http://eprint.iacr.org/2009/349.pdf.

[19] Grøstl website. http://www.groestl.info/.

[20] Markus Bernet, Luca Henzen, Hubert Kaeslin, Norbert Felber, and Wolfgang Fichtner. Hardware Implementations of the SHA-3 Candidates Shabal and CubeHash. 52nd IEEE International Midwest Symposium on Circuits and Systems, 2009. Available online at http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043.

[21] Michel Kinsy and Richard Uhler. SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec. Available online at http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf.

[22] Bernhard Jungk and Steffen Reith. On FPGA-based implementations of Grøstl. IACR Eprint report 2010/260. Available online at http://eprint.iacr.org/2010/260.pdf.

[23] Jérémie Detrey, Pierre Gaudry, and Karim Khalfallah. A Low-Area yet Performant FPGA Implementation of Shabal. IACR Eprint report 2010/292. Available online at http://eprint.iacr.org/2010/292.pdf.

[24] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. A Compact FPGA Implementation of the SHA-3 Candidate ECHO. IACR Eprint report 2010/364. Available online at http://eprint.iacr.org/2010/364.pdf.

[25] Wim Ramakers and Hans Narinx. Implementation and evaluation of SHA-3 candidates on FPGA. Extended abstract of Master Thesis "Implementatie en Evaluatie van SHA-3-Kandidaten op FPGA" (Dutch). Extended abstract available online at http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf. Full thesis available online at http://ehash.iaik.tugraz.at/uploads/6/62/Ramakers_Narinx2010ECHO-Hamsi-Luffa_Thesis_DUTCH.pdf.

[26] Julien Francq and Céline Thuillet. Unfolding Method for Shabal on Virtex-5 FPGAs: Concrete Results. IACR Eprint report 2010/406. Available online at http://eprint.iacr.org/2010/406.pdf.

[27] Shugo Mikami, Nagamasa Mizushima, Setsuko Nakamura, and Dai Watanabe. A Compact Hardware Implementation of SHA-3 Candidate Luffa. Available online at http://www.sdl.hitachi.co.jp/crypto/luffa/ACompactHardwareImplementationOfSHA-3CandidateLuffa_20100810.pdf.