SHA-3 Hardware Implementations Round Two

From The ECRYPT Hash Function Website

Jump to: navigation, search

Contents

1 Call for Contributions

Implementers (both submitters and non-submitters): You have results that complement this site? Let us know at sha3zoo-hardware@iaik.tugraz.at If you are making your HDL code available, please also provide us with according information.

2 Important Information

This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST. This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our call for contributions.

A list of hardware implementations of the round 1 candidates can be found here. Please note that the page for round 1 candidates is provided for reference and will not be updated.

The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct comparisons between different hardware implementations difficult. The more of these parameters agree, the more reasonable the comparison becomes.

The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology).

In order to facilitate the comparison of hardware modules with different implementation scopes, we classify them into three categories:

For suggestions regarding the structure of this site, let us know at sha3zoo-hardware@iaik.tugraz.at

2.1 Fully Autonomous Implementation

Image:HW_type_self-cont.jpg

Such hardware implementations include the complete functionality of a SHA-3 candidate (or a specific version thereof). That means the input message can be loaded piecewise into the hardware module and it delivers the message digest as output. All hash calculations happen exclusively within the hardware module. If integrated in a system, the achievable throughput of a fully autonomous implementation depends on the speed of the hardware module itself and the speed of the (system dependent) data interface delivering the input message.


2.2 Implementation with External Memory

Image:HW_type_ext-mem.jpg

These implementations use external memory to hold intermediate values during the hashing of a message. The implemented hardware itself normally consists of the core logic functionality of the hash function, some registers for short-lived temporary values, and possible a memory controller for access to the external memory. Such implementations can load the input message either over a dedicated interface (similar to a fully autonomous implementation) or from the external memory. In order to reach the maximal throughput of the hardware module, the external memory must be sufficiently fast.


2.3 Implementation of Core Functionality

Image:HW_type_core-funct.jpg

Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations.

3 Ongoing Hardware Benchmarking Efforts

To describe it in the words of the initiators and maintainers: "ATHENa: Automated Tool for Hardware EvaluatioN is a project started at George Mason University, aimed at fair, comprehensive, and automated evaluation of cryptographic cores developed using hardware description languages, such as VHDL and Verilog." More information about the project and the current results can be found on the ATHENa webpage. Note: As each hash module submitted to ATHENAa is implemented on several FPGA platforms, the SHA-3 zoo pages will not replicate all results produced by the ATHENa project on this webpage. Instead please refer directly to the ATHENa webpage.

4 Summary of All Results

This section includes four categories of implementations (high-speed, low-area, both for FPGA and ASIC) which include known published results. If the HDL sourcecode is available, a link is provided as well.

4.1 High-Speed Implementations (FPGA)

Important note: The size and functionality of slices varies between FPGA families. A direct comparison of the slice count of implementations on different FPGA families is therefore problematic.

Hash Function Name Reference / HDL Impl. Scope Impl. Details Technology Size Throughput Clock Frequency
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex-II Pro 3091 slices 1724 Mbit/s 37.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 4 3087 slices 2235 Mbit/s 48.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 5 1694 slices 3103 Mbit/s 67.0 MHz
BLAKE-32 Namin and Hasan [2] / N/A Core functionality Compression function with 8 G function units and I/O registers Altera Stratix III 5435 ALUTs 2186.2 Mbit/s 46.97 MHz
BLAKE-32 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 1660 slices 2676 Mbit/s 115 MHz
BLAKE-32 Homsirikamol et al. [30] / On request Fully autonomous 4 G function units per iteration Xilinx Virtex 5 1523 slices 3143 Mbit/s 128.9 MHz
BLAKE-32 Homsirikamol et al. [30] / On request Fully autonomous 4 G function units per iteration Altera Stratix III 3635 ALUTs 2901 Mbit/s 119.0 MHz
BLAKE-32 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1118 slices 1169 Mbit/s 118.06 MHz
BLAKE-32 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 1660 slices 2676 Mbit/s 115 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex-II Pro 11122 slices 1177 Mbit/s 17.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 4 11483 slices 1707 Mbit/s 25.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 5 4329 slices 2389 Mbit/s 35.0 MHz
BLAKE-64 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1718 slices 1299 Mbit/s 90.91 MHz
BLAKE-64 Homsirikamol et al. [30] / On request Fully autonomous 4 G function units per iteration Xilinx Virtex 5 3064 slices 3520 Mbit/s 99.7 MHz
BLAKE-64 Homsirikamol et al. [30] / On request Fully autonomous 4 G function units per iteration Altera Stratix III 7086 ALUTs 3161 Mbit/s 89.5 MHz
Blue Midnight Wish-256 Namin and Hasan [2] / N/A Core functionality Compression function with f0, f1, and f2 unrolled in sequence and I/O registers Altera Stratix III 12917 ALUTs 4889.6 Mbit/s 9.55 MHz
Blue Midnight Wish-256 Homsirikamol et al. [30] / On request Fully autonomous Fully unrolled Xilinx Virtex 5 4353 slices 6141 Mbit/s 12.0 MHz
Blue Midnight Wish-256 Homsirikamol et al. [30] / On request Fully autonomous Fully unrolled Altera Stratix III 12619 ALUTs 6339 Mbit/s 12.4 MHz
Blue Midnight Wish-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 4997 slices 457 Mbit/s 14.02 MHz
Blue Midnight Wish-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 4350 slices 8704 Mbit/s 34 MHz
Blue Midnight Wish-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 9810 slices 287 Mbit/s 10 MHz
Blue Midnight Wish-512 Homsirikamol et al. [30] / On request Fully autonomous Fully unrolled Altera Stratix III 25192 ALUTs 9820 Mbit/s 9.6 MHz
Blue Midnight Wish Akin et al. [34] / N/A Core functionality Compression function with f0, f1, and f2 unrolled in sequence Xilinx Spartan 3 10531 slices 2110 Mbit/s 4.22 MHz
Blue Midnight Wish Akin et al. [34] / N/A Core functionality Compression function with f0, f1, and f2 unrolled in sequence Xilinx Virtex-II 10432 slices 3360 Mbit/s 6.71 MHz
Blue Midnight Wish Akin et al. [34] / N/A Core functionality Compression function with f0, f1, and f2 unrolled in sequence Xilinx Virtex 4 10486 slices 4510 Mbit/s 9.01 MHz
CubeHash8/1-256(***) Baldwin et al. [4] / N/A Core functionality 2 compression functions unrolled Xilinx Spartan 3 3268 slices 70 Mbit/s 37.9 MHz
CubeHash8/1-256(***) Baldwin et al. [4] / N/A Core functionality 1 iterated compression function Xilinx Virtex 5 1178 slices 160 Mbit/s 166.8 MHz
CubeHash16/32-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 590 slices 2960 Mbit/s 185 MHz
CubeHash16/32-256 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 684 slices 4385 Mbit/s 274.1 MHz
CubeHash16/32-256 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 1922 ALUTs 3726 Mbit/s 232.9 MHz
CubeHash16/32-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 590 slices 2960 Mbit/s 185 MHz
CubeHash8/32 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 695 slices 2509 Mbit/s 166.83 MHz
CubeHash16/32-512 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 734 slices 4315 Mbit/s 269.7 MHz
CubeHash16/32-512 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 1930 ALUTs 3267 Mbit/s 204.2 MHz
ECHO-224/256 Lu et al. [5] / N/A Fully autonomous Xilinx Virtex 5 9333 slices 14860 Mbit/s 87.1 MHz
ECHO-224/256 Kinsy and Uhler [21] / N/A Fully autonomous 273 cycles per block Altera Cyclone II 39091 LEs 397 Mbit/s(*) 70.6 MHz
ECHO-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality Straight-forward instantiation of complete compression function Xilinx Virtex 5 15006 slices 23860 Mbit/s 139 MHz
ECHO-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality Optimized: 4 x 2 AES round instances with pipeline register in BigSubWords Xilinx Virtex 5 12061 slices 3560 Mbit/s 187 MHz
ECHO-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 3556 slices 1614 Mbit/s 104 MHz
ECHO-256 Mabrouk and Benadjila [28] / Implementer's webpage Fully autonomous Fully parallel iterations of Compress512 Xilinx Virtex 5 10407 slices 26390 Mbit/s 154.6 MHz
ECHO-256 Mabrouk and Benadjila [28] / Implementer's webpage Fully autonomous Fully parallel iterations of Compress512 Xilinx Virtex 6 8071 slices 29457 Mbit/s 172.6 MHz
ECHO-256 Homsirikamol et al. [30] / On request Fully autonomous 3 clk cycles per round Xilinx Virtex 5 4982 slices 11323 Mbit/s 184.3 MHz
ECHO-256 Homsirikamol et al. [30] / On request Fully autonomous 3 clk cycles per round Altera Stratix III 20723 ALUTs 14335 Mbit/s 233.3 MHz
ECHO-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 7372 slices 5373 Mbit/s 198.93 MHz
ECHO-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 2827 slices 2312 Mbit/s 149 MHz
ECHO-384/512 Lu et al. [5] / N/A Fully autonomous Xilinx Virtex 5 9097 slices 7810 Mbit/s 83.9 MHz
ECHO-384/512 Kinsy and Uhler [21] / N/A Fully autonomous 341 cycles per block Altera Cyclone II 39091 LEs 212 Mbit/s(**) 70.6 MHz
ECHO-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 8633 slices 18133 Mbit/s 166.69 MHz
ECHO-512 Homsirikamol et al. [30] / On request Fully autonomous 3 clk cycles per round Xilinx Virtex 5 5044 slices 7779 Mbit/s 235.5 MHz
ECHO-512 Homsirikamol et al. [30] / On request Fully autonomous 3 clk cycles per round Altera Stratix III 21187 ALUTs 8172 Mbit/s 247.4 MHz
Fugue-256 Homsirikamol et al. [30] / On request Fully autonomous 2 clk cycles per round Xilinx Virtex 5 708 slices 3495 Mbit/s 218.4 MHz
Fugue-256 Homsirikamol et al. [30] / On request Fully autonomous 2 clk cycles per round Altera Stratix III 2397 ALUTs 3319 Mbit/s 207.4 MHz
Fugue-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1689 slices 914 Mbit/s 200.04 MHz
Fugue-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 4013 slices 1248 Mbit/s 78 MHz
Fugue-384 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 2380 slices 640 Mbit/s 200.08 MHz
Fugue-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 2596 slices 481 Mbit/s 200.16 MHz
Fugue-512 Homsirikamol et al. [30] / On request Fully autonomous 4 clk cycles per round Xilinx Virtex 5 979 slices 1773 Mbit/s 221.6 MHz
Fugue-512 Homsirikamol et al. [30] / On request Fully autonomous 4 clk cycles per round Altera Stratix III 2783 ALUTs 1598 Mbit/s 199.8 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous P & Q permutation in parallel Xilinx Spartan 3 6136 slices 4520 Mbit/s 88.3 MHz
Grøstl-224/256 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Virtex 5 1722 slices 10276 Mbit/s 200.7 MHz
Grøstl-224/256 Baldwin et al. [4] / N/A Core functionality P & Q permutation in parallel, S-box in BRAM Xilinx Spartan 3 4827 slices 3660 Mbit/s 71.53 MHz
Grøstl-224/256 Baldwin et al. [4] / N/A Core functionality P & Q permutation in parallel, S-box in BRAM Xilinx Virtex 5 4516 slices 7310 Mbit/s 142.87 MHz
Grøstl-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 4057 slices 5171 Mbit/s 101 MHz
Grøstl-256 Homsirikamol et al. [30] / On request Fully autonomous P & Q permutations interleaved Xilinx Virtex 5 1597 slices 7885 Mbit/s 323.4 MHz
Grøstl-256 Homsirikamol et al. [30] / On request Fully autonomous P & Q permutations interleaved Altera Stratix III 6350 ALUTs 5380 Mbit/s 220.7 MHz
Grøstl-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 2391 slices 3242 Mbit/s 101.32 MHz
Grøstl-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 2616 slices 7885 Mbit/s 154 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Spartan 3 20233 slices 5901 Mbit/s 80.7 MHz
Grøstl-384/512 Baldwin et al. [4] / N/A Core functionality P & Q permutation parallel, S-box in LUTs Xilinx Spartan 3 17452 slices 3180 Mbit/s 79.61 MHz
Grøstl-384/512 Baldwin et al. [4] / N/A Core functionality P & Q permutation parallel, S-box in LUTs Xilinx Virtex 5 19161 slices 6090 Mbit/s 83.33 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Virtex 5 5419 slices 15395 Mbit/s 210.5 MHz
Grøstl-384/512 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation Xilinx Spartan 3 8308 slices 3474 Mbit/s 95 MHz
Grøstl-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 4845 slices 3619 Mbit/s 123.4 MHz
Grøstl-512 Homsirikamol et al. [30] / On request Fully autonomous P & Q permutations interleaved Xilinx Virtex 5 3138 slices 10314 Mbit/s 292.1 MHz
Grøstl-512 Homsirikamol et al. [30] / On request Fully autonomous P & Q permutations interleaved Altera Stratix III 12355 ALUTs 7142 Mbit/s 202.3 MHz
Hamsi-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 718 slices 1680 Mbit/s 210 MHz
Hamsi-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality Straight-forward instantiation of complete compression function Xilinx Virtex 5 4664 slices 6620 Mbit/s 207 MHz
Hamsi-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality Non-linear permutation block reused Xilinx Virtex 5 2113 slices 1970 Mbit/s 308 MHz
Hamsi-256 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 720 slices 3049 Mbit/s 285.9 MHz
Hamsi-256 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 2308 ALUTs 2997 Mbit/s 281.0 MHz
Hamsi-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1518 slices 358 Mbit/s 72.41 MHz
Hamsi-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 718 slices 1680 Mbit/s 210 MHz
Hamsi-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 6229 slices 79 Mbit/s 16.51 MHz
Hamsi-512 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1900 slices 1942 Mbit/s 182.1 MHz
Hamsi-512 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 6401 ALUTs 2001 Mbit/s 187.6 MHz
JH-256 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1018 slices 5416 Mbit/s 380.8 MHz
JH-256 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 3525 ALUTs 5515 Mbit/s 387.8 MHz
JH-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 2661 slices 2639 Mbit/s 201 MHz
JH Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1291 slices 1941 Mbit/s 250.13 MHz
JH-512 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1104 slices 5610 Mbit/s 394.5 MHz
JH-512 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 3709 ALUTs 5556 Mbit/s 390.6 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Altera Cyclone III 5776 LEs 7500 Mbit/s 133 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Altera Stratix III 4713 ALUTs 12400 Mbit/s 218 MHz
Keccak J. Strömbergson [9] / Submission webpage Fully autonomous Core (round function, state register) only Xilinx Spartan 3A 3393 slices 4800 Mbit/s 85 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Xilinx Virtex 5 1412 slices 6900 Mbit/s 122 MHz
Keccak(-224) Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1117 slices 5915 Mbit/s 189 MHz
Keccak(-256) Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1272 slices 12817 Mbit/s 282.7 MHz
Keccak(-256) Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 4213 ALUTs 12393 Mbit/s 273.4 MHz
Keccak(-256) Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1117 slices 6263 Mbit/s 189 MHz
Keccak(-256) Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 1433 slices 8397 Mbit/s 205 MHz
Keccak(-384) Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1117 slices 8190 Mbit/s 189 MHz
Keccak(-512) Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1117 slices 8518 Mbit/s 189 MHz
Keccak(-512) Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1257 slices 6845 Mbit/s 285.2 MHz
Keccak(-512) Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 3979 ALUTs 7310 Mbit/s 304.6 MHz
Keccak Akin et al. [34] / N/A Core functionality One Keccak-f round per cycle Xilinx Spartan 3 2024 slices 3460 Mbit/s 81.4 MHz
Keccak Akin et al. [34] / N/A Core functionality One Keccak-f round per cycle Xilinx Virtex-II 2024 slices 5810 Mbit/s 136.6 MHz
Keccak Akin et al. [34] / N/A Core functionality One Keccak-f round per cycle Xilinx Virtex 4 2024 slices 6070 Mbit/s 142.9 MHz
Luffa-256 Namin and Hasan [2] / N/A Core functionality Compression function (1 cycle latency) and I/O registers Altera Stratix III 16552 ALUTs 12042.2 Mbit/s 47.04 MHz
Luffa-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 1048 slices 6343 Mbit/s 223 MHz
Luffa-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality One step block reused for 8 rounds Xilinx Virtex 5 9611 slices 2303 Mbit/s 179 MHz
Luffa-256 Ramakers and Narinx [25] / Hosted by SHA-3 zoo Core functionality Straight-forward instantiation of complete compression function Xilinx Virtex 5 9611 slices 12290 Mbit/s 48.2 MHz
Luffa-256 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 949 slices 9692 Mbit/s 340.7 MHz
Luffa-256 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 3032 ALUTs 8570 Mbit/s 301.3 MHz
Luffa-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 2221 slices 5333 Mbit/s 166.67 MHz
Luffa-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 1048 slices 7424 Mbit/s 261 MHz
Luffa-384 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 3740 slices 5336 Mbit/s 166.75 MHz
Luffa-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 3700 slices 5336 Mbit/s 166.75 MHz
Luffa-512 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1960 slices 7691 Mbit/s 240.3 MHz
Luffa-512 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 6891 ALUTs 8579 Mbit/s 268.1 MHz
Luffa Akin et al. [34] / N/A Core functionality Three step modules Xilinx Spartan 3 2956 slices 1480 Mbit/s 157.3 MHz
Luffa Akin et al. [34] / N/A Core functionality Three step modules Xilinx Virtex-II 2952 slices 8370 Mbit/s 301.4 MHz
Luffa Akin et al. [34] / N/A Core functionality Three step modules Xilinx Virtex 4 2989 slices 8560 Mbit/s 308.2 MHz
Shabal Feron and Francq [10] / N/A Fully autonomous 36 adders in permutation Xilinx Virtex 5 1171 slices 2588 Mbit/s 126 MHz
Shabal Francq and Thuillet [26] / Shabal webpage Fully autonomous 4 iterations of the permutation unrolled Xilinx Virtex 5 1715 slices 3242 Mbit/s 76 MHz
Shabal Baldwin et al. [4] / N/A Core functionality 36 adders in permutation Xilinx Spartan 3 2223 slices 740 Mbit/s 71.48 MHz
Shabal Baldwin et al. [4] / N/A Core functionality 36 adders in permutation Xilinx Virtex 5 2768 slices 1450 Mbit/s 138.87 MHz
Shabal Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1583 slices 1469 Mbit/s 148.04 MHz
Shabal-256 Namin and Hasan [2] / N/A Core functionality Compression function with I/O registers (latency of 16 clock cycles) Altera Stratix III 1440 ALUTs 3125.6 Mbit/s 195.35 MHz
Shabal-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 1251 slices 1739 Mbit/s 214 MHz
Shabal-256 Homsirikamol et al. [30] / On request Fully autonomous 32-bit datapath Xilinx Virtex 5 283 slices 1719 Mbit/s 214.9 MHz
Shabal-256 Homsirikamol et al. [30] / On request Fully autonomous 32-bit datapath Altera Stratix III 1744 ALUTs 877 Mbit/s 109.7 MHz
Shabal-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 1251 slices 2335 Mbit/s 228 MHz
Shabal-512 Detrey et al. [23] / INRIA webpage (see SCM tree) Fully autonomous Exploiting SRL16 primitive Xilinx Virtex 5 153 slices 2051 Mbit/s 256 MHz
Shabal-512 Detrey et al. [23] / INRIA webpage (see SCM tree) Fully autonomous Exploiting SRL16 primitive Xilinx Spartan 3 499 slices 800 Mbit/s 100 MHz
Shabal-512 Homsirikamol et al. [30] / On request Fully autonomous 32-bit datapath Xilinx Virtex 5 283 slices 1719 Mbit/s 214.9 MHz
Shabal-512 Homsirikamol et al. [30] / On request Fully autonomous 32-bit datapath Altera Stratix III 1744 ALUTs 877 Mbit/s 109.7 MHz
SHAvite-3256 Homsirikamol et al. [30] / On request Fully autonomous 3 clk cycles per round Xilinx Virtex 5 1076 slices 3253 Mbit/s 235.1 MHz
SHAvite-3256 Homsirikamol et al. [30] / On request Fully autonomous 3 clk cycles per round Altera Stratix III 3042 ALUTs 3397 Mbit/s 245.5 MHz
SHAvite-3256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 3125 slices 1170 Mbit/s 109.17 MHz
SHAvite-3256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 1063 slices 3382 Mbit/s 251 MHz
SHAvite-3512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 9775 slices 931 Mbit/s 59.4 MHz
SHAvite-3512 Homsirikamol et al. [30] / On request Fully autonomous 4 clk cycles per round Xilinx Virtex 5 2090 slices 3841 Mbit/s 213.8 MHz
SHAvite-3512 Homsirikamol et al. [30] / On request Fully autonomous 4 clk cycles per round Altera Stratix III 5619 ALUTs 4071 Mbit/s 226.6 MHz
SIMD-256 Homsirikamol et al. [30] / On request Fully autonomous 4 SIMD steps unrolled Xilinx Virtex 5 8922 slices 3123 Mbit/s 54.9 MHz
SIMD-256 Homsirikamol et al. [30] / On request Fully autonomous 4 SIMD steps unrolled Altera Stratix III 25728 ALUTs 3123 Mbit/s 54.9 MHz
SIMD-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 22704 slices 1338 Mbit/s 107.2 MHz
SIMD-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 3987 slices 835 Mbit/s 75 MHz
SIMD-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 43729 slices 2677 Mbit/s 107.2 MHz
SIMD-512 Homsirikamol et al. [30] / On request Fully autonomous 4 SIMD steps unrolled Xilinx Virtex 5 19639 slices 4938 Mbit/s 43.4 MHz
SIMD-512 Homsirikamol et al. [30] / On request Fully autonomous 4 SIMD steps unrolled Altera Stratix III 53623 ALUTs 5668 Mbit/s 49.8 MHz
Skein-256-h Men Long [11] / N/A Core functionality UBI component Xilinx Virtex 5 1001 slices 408.7 Mbit/s 114.9 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Virtex 5 937 slices 1751 Mbit/s 68.4 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Spartan 3 2421 slices 669 Mbit/s 26.14 MHz
Skein-256-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 854 slices 1482 Mbit/s 115 MHz
Skein-512-256 Homsirikamol et al. [30] / On request Fully autonomous 4 Threefish rounds unrolled Xilinx Virtex 5 1621 slices 3178 Mbit/s 118.0 MHz
Skein-512-256 Homsirikamol et al. [30] / On request Fully autonomous 4 Threefish rounds unrolled Altera Stratix III 4645 ALUTs 2503 Mbit/s 92.9 MHz
Skein-256-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 854 slices 1402 Mbit/s 115 MHz
Skein-512-h Men Long [11] / N/A Core functionality UBI component Xilinx Virtex 5 1877 slices 817.4 Mbit/s 114.9 MHz
Skein-512-512 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Virtex 5 1632 slices 3535 Mbit/s 69.04 MHz
Skein-512-512 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Spartan 3 4273 slices 1365 Mbit/s 26.66 MHz
Skein-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1786 slices 1945 Mbit/s 83.65 MHz
Skein-512-512 Homsirikamol et al. [30] / On request Fully autonomous 4 Threefish rounds unrolled Xilinx Virtex 5 1716 slices 3209 Mbit/s 119.1 MHz
Skein-512-512 Homsirikamol et al. [30] / On request Fully autonomous 4 Threefish rounds unrolled Altera Stratix III 4794 ALUTs 2434 Mbit/s 90.3 MHz

(*) Estimated peak throughput ignoring I/O bottleneck resulting from specific interface: (1536 bits/block) * (70.6 * 10^6 cycles/s) / (273 cycles/block) = 397.22 * 10^6 bits/s.
(**) Estimated peak throughput ignoring I/O bottleneck resulting from specific interface: (1024 bits/block) * (70.6 * 10^6 cycles/s) / (341 cycles/block) = 212.01 * 10^6 bits/s.
(***) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.



4.2 Low-Area Implementations (FPGA)

Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-32 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Spartan-3 124 slices 115 Mbit/s 190.0 MHz
BLAKE-32 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-4 124 slices 216 Mbit/s 357.0 MHz
BLAKE-32 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-5 56 slices 225 Mbit/s 372.0 MHz
BLAKE-32 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Altera Cyclone III 285 LEs 116 Mbit/s 192.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex-II Pro 958 slices 371 Mbit/s 59.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 4 960 slices 430 Mbit/s 68.0 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 5 390 slices 575 Mbit/s 91.0 MHz
BLAKE-64 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Spartan-3 229 slices 138 Mbit/s 158.0 MHz
BLAKE-64 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-4 230 slices 219 Mbit/s 250.0 MHz
BLAKE-64 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-5 108 slices 314 Mbit/s 358.0 MHz
BLAKE-64 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Altera Cyclone III 542 LEs 123 Mbit/s 140.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex-II Pro 1802 slices 326 Mbit/s 36.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 4 1856 slices 381 Mbit/s 42.0 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 5 939 slices 533 Mbit/s 59.0 MHz
Blue Midnight Wish-256 El Hadedy et al. [32] / N/A Fully autonomous 32-bit datapath, 1 memory block Xilinx Virtex 895 slices 9 Mbit/s 38 MHz
Blue Midnight Wish-256 El Hadedy et al. [32] / N/A Fully autonomous 32-bit datapath, 2 memory blocks Xilinx Virtex 5 84 slices 28 Mbit/s 116 MHz
Blue Midnight Wish-256 El Hadedy et al. [41] / Submission webpage Fully autonomous 32-bit datapath, 3 memory blocks Xilinx Virtex 5 51 slices 68.71 Mbit/s 141 MHz
Blue Midnight Wish-512 El Hadedy et al. [41] / Submission webpage Fully autonomous 64-bit datapath, 3 memory blocks Xilinx Virtex 5 105 slices 112.18 Mbit/s 115 MHz
ECHO Beuchat et al. [24] / On request from author Fully autonomous Adapted towards FPGA implementation (127 slices and 1 memory block) Xilinx Virtex 5 127 slices 72 Mbit/s 352.0 MHz
ECHO Announced 19-08-2010 on hash-forum@nist.gov / On request from author Fully autonomous All ECHO + all AES variants Xilinx Virtex 5 231 slices 81.7 Mbit/s (ECHO-224/256), 41.9 Mbit/s (ECHO-384/512) 351.0 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous 64-bit datapath, P & Q permutation in parallel Xilinx Spartan 3 2486 slices 404 Mbit/s 63.2 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous 64-bit datapath, P & Q permutation in parallel Xilinx Virtex 2 Pro 2754 slices 512 Mbit/s 81.5 MHz
Grøstl-224/256 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation, S-Box based on composite field arithmetic Xilinx Spartan 3 1276 slices 192 Mbit/s 60 MHz
Grøstl-384/512 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation, S-Box based on composite field arithmetic Xilinx Spartan 3 2110 slices 144 Mbit/s 63 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Altera Stratix III 855 ALUTs 96.8 Mbit/s 366 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Altera Cyclone III 1559 LEs 47.8 Mbit/s 181 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Xilinx Virtex 5 444 slices 70.1 Mbit/s 265 MHz
Luffa-256 Mikami et al. [27] / N/A Fully autonomous One permutation block (64 S-boxes, 4 MixWord blocks) Xilinx Virtex 5 355 slices 33 Mbit/s 50 MHz
Shabal Feron and Francq [10] / N/A Fully autonomous 36 adders in permutation Xilinx Virtex 5 596 slices (+ 40 DSP blocks) 1142 Mbit/s 109 MHz
Shabal Baldwin et al. [4] / N/A Core functionality 1 adder in permutation Xilinx Spartan 3 1933 slices 540 Mbit/s 89.71 MHz
Shabal Baldwin et al. [4] / N/A Core functionality 1 adder in permutation Xilinx Virtex 5 2307 slices 1330 Mbit/s 222.22 MHz
Shabal-512 Detrey et al. [23] / INRIA webpage (see SCM tree) Fully autonomous Exploiting SRL16 primitive Xilinx Virtex 5 153 slices 2051 Mbit/s 256 MHz
Shabal-512 Detrey et al. [23] / INRIA webpage (see SCM tree) Fully autonomous Exploiting SRL16 primitive Xilinx Spartan 3 499 slices 800 Mbit/s 100 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality One round of Threefish iterated Altera Stratix III 1385 ALUTs 573.9 Mbit/s 161.42 MHz



4.3 High-Speed Implementations (ASIC)

A comparison of implementations of all 14 round 2 candidates has been presented informally at IAIK (Graz University of Technology) on Sept. 16, 2009. The updated presentation slides can be found here.


Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units UMC 0.18 µm 58.30 kGates 5295 Mbit/s 114 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with 4 G function units UMC 0.18 µm 41.31 kGates 4153 Mbit/s 170 MHz
BLAKE-32 Namin and Hasan [2] / N/A Core functionality Compression function with 8 G function units and I/O registers STM 90 nm 53 kGates 4475 Mbit/s(*) 96.15 MHz
BLAKE-32 Tillich et al. [14] / On request Fully autonomous Compression function with 4 G function units with CSAs UMC 0.18 µm 45.64 kGates 3971 Mbit/s 170.64 MHz
BLAKE-32 Henzen et al. [29] / ETH webpage Fully autonomous Four parallel G functions modules UMC 90 nm 47.5 kGates 9752 Mbit/s 400 MHz
BLAKE-32 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 43.52 kGates 4645 Mbit/s 200 MHz
BLAKE-32 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 37 kGates 6668 Mbit/s 286.5 MHz
BLAKE-32 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 0.18 µm 79 kGates 6376 Mbit/s 137 MHz
BLAKE-32 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 0.18 µm 48 kGates 5847 Mbit/s 240 MHz
BLAKE-32 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 0.13 µm 67 kGates 9365 Mbit/s 201 MHz
BLAKE-32 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 0.13 µm 43 kGates 8047 Mbit/s 330 MHz
BLAKE-32 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 90 nm 65 kGates 17498 Mbit/s 376 MHz
BLAKE-32 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 90 nm 38 kGates 15143 Mbit/s 621 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units UMC 0.18 µm 132.47 kGates 5910 Mbit/s 87 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with 4 G function units UMC 0.18 µm 82.73 kGates 4810 Mbit/s 136 MHz
BLAKE-64 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 0.18 µm 147 kGates 7216 Mbit/s 106 MHz
BLAKE-64 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 0.18 µm 98 kGates 7192 Mbit/s 204 MHz
BLAKE-64 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 0.13 µm 139 kGates 10802 Mbit/s 158 MHz
BLAKE-64 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 0.13 µm 92 kGates 10265 Mbit/s 291 MHz
BLAKE-64 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 90 nm 128 kGates 20317 Mbit/s 298 MHz
BLAKE-64 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 90 nm 79 kGates 18782 Mbit/s 532 MHz
Blue Midnight Wish-256 Namin and Hasan [2] / N/A Core functionality Compression function with f0, f1, and f2 unrolled in sequence and I/O registers STM 90 nm 164 kGates 26665 Mbit/s(*) 52.08 MHz
Blue Midnight Wish-256 Tillich et al. [14] / On request Fully autonomous Compression function with f0, f1, and f2 unrolled UMC 0.18 µm 169.74 kGates 5358 Mbit/s 10.46 MHz
Blue Midnight Wish-256 Henzen et al. [29] / ETH webpage Fully autonomous single-cycle f0 and f2, f1 iteratively UMC 90 nm 150 kGates 8486 Mbit/s 298 MHz
Blue Midnight Wish-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 198.17 kGates 12220 Mbit/s 48 MHz
Blue Midnight Wish Akin et al. [34] / N/A Core functionality Compression function with f0, f1, and f2 unrolled in sequence Synopsys 90 nm 55.9 kGates 26320 Mbit/s 52.63 MHz
Blue Midnight Wish-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 128.7 kGates 25937 Mbit/s 101.3 MHz
CubeHash16/32-h Tillich et al. [14] / On request Fully autonomous Dynamically reconfigurable r and b parameters, two rounds unrolled UMC 0.18 µm 58.87 kGates 4665 Mbit/s 145.77 MHz
CubeHash16/32-h Bernet et al. [20] / N/A Fully autonomous One round per cycle 0.13 µm 34.33 kGates 9248 Mbit/s(***) 578 MHz
CubeHash16/32-h Bernet et al. [20] / N/A Fully autonomous Half a round per cycle 0.13 µm 21.54 kGates 8000 Mbit/s(***) 1000 MHz
CubeHash16/32-256 Henzen et al. [29] / ETH webpage Fully autonomous One round per cycle, IV fixed UMC 90 nm 42.5 kGates 10667 Mbit/s 667 MHz
CubeHash16/32-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 38.18 kGates 4624 Mbit/s 289 MHz
CubeHash16/32-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 35.5 kGates 8247 Mbit/s 515.5 MHz
ECHO-224/256 Lu et al. [5] / N/A Fully autonomous 0.13 µm 521.1 kGates 14850 Mbit/s 87.1 MHz
ECHO-256 Tillich et al. [14] / On request Fully autonomous Four parallel AES rounds, 16 AES MixColumns 32-bit column multipliers UMC 0.18 µm 141.49 kGates 2246 Mbit/s 141.84 MHz
ECHO-256 Henzen et al. [29] / ETH webpage Fully autonomous 8 AES rounds per cycle UMC 90 nm 260 kGates 13966 Mbit/s 291 MHz
ECHO-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 92.73 kGates 3366 Mbit/s 217 MHz
ECHO-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 101.1 kGates 5621 Mbit/s 362.3 MHz
ECHO-384/512 Lu et al. [5] / N/A Fully autonomous 0.13 µm 516.8 kGates 7750 Mbit/s 83.3 MHz
Fugue-256 Submission doc. [15] / N/A Fully autonomous Four columns of SMIX transformation in parallel (SUPER4_P) IBM 90 nm 109.85 kGates 13913 Mbit/s 869.5 MHz
Fugue-256 Tillich et al. [14] / On request Fully autonomous Four columns of SMIX transformation in parallel UMC 0.18 µm 46.26 kGates 4092 Mbit/s 255.75 MHz
Fugue-256 Henzen et al. [29] / ETH webpage Fully autonomous S-box as LUT UMC 90 nm 55 kGates 8815 Mbit/s 551 MHz
Fugue-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 91.09 kGates 2385 Mbit/s 149 MHz
Fugue-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 56.7 kGates 2721 Mbit/s 170.1 MHz
Grøstl-256 Tillich et al. [14] / On request Fully autonomous One shared permutation for P & Q, one pipeline stage UMC 0.18 µm 58.40 kGates 6290 Mbit/s 270.27 MHz
Grøstl-256 Henzen et al. [29] / ETH webpage Fully autonomous P and Q permutation interleaved with one pipeline stage, S-box as LUT UMC 90 nm 135 kGates 16254 Mbit/s 667 MHz
Grøstl-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 110.11 kGates 9606 Mbit/s 188 MHz
Grøstl-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 139.1 kGates 17297 Mbit/s 337.8 MHz
Grøstl-256 RCIS webpage [39] / RCIS webpage Fully autonomous STM 90 nm 120.8 kGates 16275 Mbit/s 349.7 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel UMC 0.18 µm 341 kGates 6225 Mbit/s 85.1 MHz
Hamsi-256 Junfeng Fan (Hamsi website) [16] / N/A Fully autonomous 0.13 µm 22 kGates 4940 Mbit/s 1080 MHz
Hamsi-256 Tillich et al. [14] / On request Fully autonomous Three instances of P/Pf function unrolled UMC 0.18 µm 58.66 kGates 5565 Mbit/s 173.91 MHz
Hamsi-256 Henzen et al. [29] / ETH webpage Fully autonomous Message expansions in LUTs, one round per cycle UMC 90 nm 45 kGates 8686 Mbit/s 814 MHz
Hamsi-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 29.94 kGates 3571 Mbit/s 446 MHz
Hamsi-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 67.6 kGates 7767 Mbit/s 970.9 MHz
Hamsi-512 Junfeng Fan (Hamsi website) [16] / N/A Fully autonomous 0.13 µm 50 kGates 3970 Mbit/s 820 MHz
JH-256 Tillich et al. [14] / On request Fully autonomous 320 S-boxes, one round of R8 per cycle UMC 0.18 µm 58.83 kGates 4991 Mbit/s 380.22 MHz
JH-256 Henzen et al. [29] / ETH webpage Fully autonomous S-boxes as LUTs, stored constants UMC 90 nm 80 kGates 10807 Mbit/s 760 MHz
JH-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 62.42 kGates 5128 Mbit/s 391 MHz
JH-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 54.6 kGates 10022 Mbit/s 763.4 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer ST 0.13 µm 48 kGates 29900 Mbit/s 526 MHz
Keccak Submission doc. [8] / Submission webpage Fully autonomous Core (round function, state register) only ST 0.13 µm 40 kGates 15000 Mbit/s 500 MHz
Keccak(-256) Tillich et al. [14] / On request Fully autonomous One instance of Keccak-f round UMC 0.18 µm 56.32 kGates 21229 Mbit/s 487.80 MHz
Keccak(-256) Henzen et al. [29] / ETH webpage Fully autonomous One round per cycle UMC 90 nm 50 kGates 43011 Mbit/s 949 MHz
Keccak(-256) Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 47.43 kGates 15457 Mbit/s 377 MHz
Keccak Akin et al. [34] / N/A Core functionality One Keccak-f round per cycle Synopsys 90 nm 10.5 kGates 19320 Mbit/s 454.5 MHz
Keccak(-256) RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 50.7 kGates 33333 Mbit/s 781.3 MHz
Keccak(-256) RCIS webpage [39] / RCIS webpage Fully autonomous STM 90 nm 55.9 kGates 43986 Mbit/s 1030.9 MHz
Luffa-224/256 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) UMC 0.13 µm 30.83 kGates 31960 Mbit/s 1124 MHz
Luffa-256 Namin and Hasan [2] / N/A Core functionality Compression function (1 cycle latency) and I/O registers STM 90 nm 122 kGates 25702 Mbit/s(*) 100.4 MHz
Luffa-224/256 Tillich et al. [14] / On request Fully autonomous Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) UMC 0.18 µm 44.97 kGates 13741 Mbit/s 483.09 MHz
Luffa-256 Henzen et al. [29] / ETH webpage Fully autonomous Three parallel step modules, SubCrumb as logic UMC 90 nm 55 kGates 23256 Mbit/s 727 MHz
Luffa-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 37.94 kGates 13943 Mbit/s 490 MHz
Luffa-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 39.6 kGates 28732 Mbit/s 1010.1 MHz
Luffa-256 Satoh et al. [38] / RCIS webpage Fully autonomous Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each), two rounds unrolled STM 90 nm 62.8 kGates 35068.5 Mbit/s 684.9 MHz
Luffa-384 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous Four permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) UMC 0.13 µm 50.07 kGates 23126 Mbit/s 813 MHz
Luffa-512 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous Five permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) UMC 0.13 µm 65.1 kGates 19617 Mbit/s 690 MHz
Luffa Akin et al. [34] / N/A Core functionality Three step modules Synopsys 90 nm 11.5 kGates 21370 Mbit/s 769.2 MHz
Shabal-256 Namin and Hasan [2] / N/A Core functionality Compression function with I/O registers (latency of 16 clock cycles) STM 90 nm 20 kGates 4408 Mbit/s(*) 413.22 MHz
Shabal-256 Tillich et al. [14] / On request Fully autonomous One word rotation per cycle, 50 cycles per block UMC 0.18 µm 54.19 kGates 3282 Mbit/s 320.51 MHz
Shabal Bernet et al. [20] / N/A Fully autonomous One word rotation per cycle, 52 cycles per block 0.13 µm 41.32 kGates 6351 Mbit/s(***) 645 MHz
Shabal-256 Henzen et al. [29] / ETH webpage Fully autonomous 30 adders, 16 subtractors UMC 90 nm 45 kGates 6819 Mbit/s 693 MHz
Shabal-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 49.44 kGates 2945 Mbit/s 362 MHz
Shabal-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 34.6 kGates 6059 Mbit/s 591.7 MHz
SHAvite-3256 Tillich et al. [14] / On request Fully autonomous Four AES rounds (two for compression, two for message expansion) UMC 0.18 µm 57.39 kGates 3152 Mbit/s 227.79 MHz
SHAvite-3256 Henzen et al. [29] / ETH webpage Fully autonomous One AES round each for message expansion and F3 round UMC 90 nm 75 kGates 7999 Mbit/s 562 MHz
SHAvite-3256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 55.25 kGates 4599 Mbit/s 341 MHz
SHAvite-3256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 59.4 kGates 8421 Mbit/s 625 MHz
SIMD-256(**) Tillich et al. [14] / On request Fully autonomous Two FFT-64 with two FFT-8 and 16 multipliers (8x8 bit) each UMC 0.18 µm 104.17 kGates 924 Mbit/s 64.93 MHz
SIMD-256 Henzen et al. [29] / ETH webpage Fully autonomous Four parallel Feistel modules, message expansion based on NNT8 and eight multipliers UMC 90 nm 135 kGates 5177 Mbit/s 364 MHz
SIMD-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 139.55 kGates 2157 Mbit/s 194 MHz
SIMD-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 139 kGates 3171 Mbit/s 284.9 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 53.87 kGates 1762 Mbit/s 68.8 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality All 72 Threefish rounds unrolled STM 90 nm 369 kGates 3126 Mbit/s(*) 12.21 MHz
Skein-256-256 Tillich et al. [14] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 58.61 kGates 1882 Mbit/s 73.52 MHz
Skein-256-256 Henzen et al. [29] / ETH webpage Fully autonomous Four unrolled Threefish rounds UMC 90 nm 50 kGates 3558 Mbit/s 264 MHz
Skein-256-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 40.9 kGates 1941 Mbit/s 159 MHz
Skein-256-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 43.1 kGates 3295 Mbit/s 270.3 MHz
Skein-512-512 Tillich et al. [14] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 102.04 kGates 2502 Mbit/s 48.87 MHz
Skein-512 Walker et al. [36] / N/A] Fully autonomous 8 Threefish rounds unrolled Intel 32 nm 57.93 kGates 32320 Mbit/s 631.31 MHz

(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.
(**) Implementation of round-one variant.
(***) Estimated peak throughput: Throughput for CubeHash8/1-h implementation * 16.



4.4 Low-Area Implementations (ASIC)

Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-32 Tillich et al. [18] / N/A Fully autonomous One G function in 11 cycles AMS 0.35 µm 25.57 kGates 15.4 Mbit/s 31.25 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with a single G function unit UMC 0.18 µm 10.54 kGates 253 Mbit/s 40 MHz
BLAKE-32 Submission doc. [1] / Submission webpage Core functionality Compression function with a half G function unit UMC 0.18 µm 9.89 kGates 127 Mbit/s 40 MHz
BLAKE-32 Henzen et al. [40] / Submission webpage Fully autonomous 1 adder and 4-word latch array UMC 0.18 µm 13.56 kGates 135 Mbit/s 215 MHz
BLAKE-32 Henzen et al. [40] / Submission webpage Using external memory 1 adder and 4-word latch array UMC 0.18 µm 8.60 kGates 62 Mbit/s 100 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with a single G function unit UMC 0.18 µm 20.61 kGates 181 Mbit/s 20 MHz
BLAKE-64 Submission doc. [1] / Submission webpage Core functionality Compression function with a half G function unit UMC 0.18 µm 19.46 kGates 91 Mbit/s 20 MHz
CubeHash16/32-h Bernet et al. [20] / N/A Fully autonomous Process two 32-bit words per cycle, 64 cycles per round 0.13 µm 7.63 kGates 32 Mbit/s(****) 100 MHz
ECHO-224/256 Lu et al. [5] / N/A Fully autonomous 0.13 µm 82.8 kGates 373 Mbit/s 66.6 MHz
Fugue-256 Submission doc. [15] / N/A Fully autonomous One SMIX transformation (SUPER1_L) IBM 90 nm 59.22 kGates 2000 Mbit/s 500 MHz
Grøstl-224/256 Tillich et al. [18] / N/A Fully autonomous 64-bit datapath, P & Q permutation shared AMS 0.35 µm 14.62 kGates 145.9 Mbit/s 55.87 MHz
Grøstl-224/256 Grøstl website [19] / N/A Fully autonomous 64-bit datapath, P & Q permutation shared UMC 0.18 µm 17 kGates 645 Mbit/s 246.9 MHz
Grøstl-256 RCIS webpage [39] / RCIS webpage Fully autonomous STM 90 nm 34.8 kGates 2478 Mbit/s 101.6 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory ST 0.13 µm 6.5 kGates 176.4 Mbit/s(*) 666.7 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory, clock freq. limited to 200 MHz ST 0.13 µm 5 kGates 52.9 Mbit/s(**) 200 MHz
Luffa-224/256 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous One permutation block (64 S-boxes, 4 MixWord blocks) UMC 0.13 µm 18.26 kGates 2461 Mbit/s 250 MHz
Luffa-256 Mikami et al. [27] / N/A Fully autonomous One permutation block (64 S-boxes, 4 MixWord blocks) UMC 0.13 µm 10.34 kGates 538 Mbit/s 806 MHz
Luffa-256 Satoh et al. [38] / RCIS webpage Fully autonomous One permutation block (64 S-boxes, 4 MixWord blocks) STM 90 nm 14.7 kGates 3641.1 Mbit/s 355.9 MHz
Luffa-384 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous 6 S-boxes, 1 MixWord TSMC 90 nm 27.13 kGates 1882 Mbit/s 250 MHz
Luffa-512 Knežević and Verbauwhede [17] / Author's webpage Fully autonomous One permutation block (64 S-boxes, 4 MixWord blocks) UMC 0.13 µm 37.35 kGates 1524 Mbit/s 250 MHz
Shabal Bernet et al. [20] / N/A Fully autonomous One adder, one subtractor, one incrementer. 165 cycles per block 0.13 µm 23.32 kGates 310 Mbit/s 100 MHz
Skein-256-256 Tillich et al. [18] / N/A Fully autonomous 64-bit datapath AMS 0.35 µm 12.89 kGates 19.8 Mbit/s 80 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality One round of Threefish iterated STM 90 nm 21 kGates 1018.8 Mbit/s(***) 286.53 MHz

(*) Estimation for 64-bit memory interface: (1024 bits/permutation) * (666.7 * 10^6 cycles/s) / (3870 cycles/permutation) = 176.41 * 10^6 bits/s
(**) Estimation for 64-bit memory interface: (1024 bits/permutation) * (200 * 10^6 cycles/s) / (3870 cycles/permutation) = 52.92 * 10^6 bits/s
(***) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s
(****) Estimated peak throughput: Throughput for CubeHash8/1-h implementation * 16.



5 Comparative Studies

This section summarizes the reported results of publications which examined more than one round-two candidate in a similar setup.

5.1 Blake, BMW, Luffa, Shabal, Skein

Reference HDL Category Impl. Scope Technology
Namin and Hasan [2] N/A High-speed FPGA Core functionality Altera Stratix III


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 Compression function with 8 G function units and I/O registers 5435 ALUTs 2186.2 Mbit/s 46.97 MHz
Blue Midnight Wish-256 Compression function with f0, f1, and f2 unrolled in sequence and I/O registers 12917 ALUTs 4889.6 Mbit/s 9.55 MHz
Luffa-256 Compression function (1 cycle latency) and I/O registers 16552 ALUTs 12042.2 Mbit/s 47.04 MHz
Shabal-256 Compression function with I/O registers (latency of 16 clock cycles) 1440 ALUTs 3125.6 Mbit/s 195.35 MHz
Skein-256-256 All 72 Threefish rounds unrolled (device too small) N/A N/A N/A




Reference HDL Category Impl. Scope Technology
Namin and Hasan [2] N/A High-speed ASIC Core functionality STM 90 nm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 Compression function with 8 G function units and I/O registers 53 kGates 4475 Mbit/s(*) 96.15 MHz
Blue Midnight Wish-256 Compression function with f0, f1, and f2 unrolled in sequence and I/O registers 164 kGates 26665 Mbit/s(*) 52.08 MHz
Luffa-256 Compression function (1 cycle latency) and I/O registers 122 kGates 25702 Mbit/s(*) 100.4 MHz
Shabal-256 Compression function with I/O registers (latency of 16 clock cycles) 20 kGates 4408 Mbit/s(*) 413.22 MHz
Skein-256-256 All 72 Threefish rounds unrolled 369 kGates 3126 Mbit/s(*) 12.21 MHz

(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.



5.2 Blake, CubeHash, ECHO, Grøstl, Hamsi, Luffa, Shabal, Skein

Reference HDL Category Impl. Scope Technology
Kobayashi et al. [3] RCIS webpage High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 1660 slices 2676 Mbit/s 115 MHz
CubeHash16/32-256 590 slices 2960 Mbit/s 185 MHz
ECHO-256 3556 slices 1614 Mbit/s 104 MHz
Grøstl-256 4057 slices 5171 Mbit/s 101 MHz
Hamsi-256 718 slices 1680 Mbit/s 210 MHz
Luffa-256 1048 slices 6343 Mbit/s 223 MHz
Shabal-256 1251 slices 1739 Mbit/s 214 MHz
Skein-256 854 slices 1482 Mbit/s 115 MHz



5.3 CubeHash, Grøstl, Shabal

Reference HDL Category Impl. Scope Technology
Baldwin et al. [4] N/A High-speed FPGA Core functionality Xilinx Spartan 3


Hash Function Name Impl. Details Size Throughput Clock Frequency
CubeHash8/1-256(*) 2 compression functions unrolled 3268 slices 70 Mbit/s 37.9 MHz
Grøstl-224/256 P & Q permutation in parallel, S-box in BRAM 4827 slices 3660 Mbit/s 71.53 MHz
Grøstl-384/512 P & Q permutation parallel, S-box in LUTs 17452 slices 3180 Mbit/s 79.61 MHz
Shabal 36 adders in permutation 2223 slices 740 Mbit/s 71.48 MHz

(*) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.




Reference HDL Category Impl. Scope Technology
Baldwin et al. [4] N/A High-speed FPGA Core functionality Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
CubeHash8/1-256(*) 1 iterated compression function 1178 slices 160 Mbit/s 166.8 MHz
Grøstl-224/256 P & Q permutation in parallel, S-box in BRAM 4516 slices 7310 Mbit/s 142.87 MHz
Grøstl-384/512 P & Q permutation parallel, S-box in LUTs 19161 slices 6090 Mbit/s 83.33 MHz
Shabal 36 adders in permutation 2768 slices 1450 Mbit/s 138.87 MHz

(*) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.



5.4 All 14 Round-Two Candidates

Reported results are post-synthesis. An interactive graphical comparison of various area-performance tradeoffs of this study can be found here.


Reference HDL Category Impl. Scope Technology
Tillich et al. [14] On request High-speed ASIC Fully autonomous UMC 0.18 µm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 Compression function with 4 G function units with CSAs 45.64 kGates 3971 Mbit/s 170.64 MHz
Blue Midnight Wish-256 Compression function with f0, f1, and f2 unrolled 169.74 kGates 5358 Mbit/s 10.46 MHz
CubeHash16/32-h Dynamically reconfigurable r and b parameters, two rounds unrolled 58.87 kGates 4665 Mbit/s 145.77 MHz
ECHO-256 Four parallel AES rounds, 16 AES MixColumns 32-bit column multipliers 141.49 kGates 2246 Mbit/s 141.84 MHz
Fugue-256 Four columns of SMIX transformation in parallel 46.26 kGates 4092 Mbit/s 255.75 MHz
Grøstl-256 One shared permutation for P & Q, one pipeline stage 58.40 kGates 6290 Mbit/s 270.27 MHz
Hamsi-256 Three instances of P/Pf function unrolled 58.66 kGates 5565 Mbit/s 173.91 MHz
JH-256 320 S-boxes, one round of R8 per cycle 58.83 kGates 4991 Mbit/s 380.22 MHz
Keccak(-256) One instance of Keccak-f round 56.32 kGates 21229 Mbit/s 487.80 MHz
Luffa-224/256 Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) 44.97 kGates 13741 Mbit/s 483.09 MHz
Shabal-256 One word rotation per cycle, 50 cycles per block 54.19 kGates 3282 Mbit/s 320.51 MHz
SHAvite-3256 Four AES rounds (two for compression, two for message expansion) 57.39 kGates 3152 Mbit/s 227.79 MHz
SIMD-256(*) Two FFT-64 with two FFT-8 and 16 multipliers (8x8 bit) each 104.17 kGates 924 Mbit/s 64.93 MHz
Skein-256-256 8 Threefish rounds unrolled 58.61 kGates 1882 Mbit/s 73.52 MHz
Skein-512-512 8 Threefish rounds unrolled 102.04 kGates 2502 Mbit/s 48.87 MHz

(*) Implementation of round-one variant.



5.5 BLAKE, Grøstl, Skein

Reference HDL Category Impl. Scope Technology
Tillich et al. [18] N/A Low-area ASIC Fully autonomous AMS 0.35 µm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 One G function in 11 cycles 25.57 kGates 15.4 Mbit/s 31.25 MHz
Grøstl-224/256 64-bit datapath, P & Q permutation shared 14.62 kGates 145.9 Mbit/s 55.87 MHz
Skein-256-256 64-bit datapath 12.89 kGates 19.8 Mbit/s 80 MHz


5.6 ECHO, Hamsi, Luffa

Reference HDL Category Impl. Scope Technology
Ramakers and Narinx [25] Hosted by SHA-3 zoo High-speed FPGA Core functionality Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
ECHO-256 Straight-forward instantiation of complete compression function 15006 slices 23860 Mbit/s 139 MHz
ECHO-256 Optimized: 4 x 2 AES round instances with pipeline register in BigSubWords 12061 slices 3560 Mbit/s 187 MHz
Hamsi-256 Straight-forward instantiation of complete compression function 4664 slices 6620 Mbit/s 207 MHz
Hamsi-256 Non-linear permutation block reused 2113 slices 1970 Mbit/s 308 MHz
Luffa-256 Straight-forward instantiation of complete compression function 9611 slices 12290 Mbit/s 48.2 MHz
Luffa-256 One step block reused for 8 rounds 2303 slices 5090 Mbit/s 179 MHz


5.7 All 14 Round-Two Candidates

Reported results of this study are post-P&R performances of designs targeting high throughput.


Reference HDL Category Impl. Scope Technology
Henzen et al. [29] ETH webpage High-speed ASIC Fully autonomous UMC 90 nm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 Four parallel G functions modules 47.5 kGates 9752 Mbit/s 400 MHz
Blue Midnight Wish-256 single-cycle f0 and f2, f1 iteratively 150 kGates 8486 Mbit/s 298 MHz
CubeHash16/32-256 One round per cycle, IV fixed 42.5 kGates 10667 Mbit/s 667 MHz
ECHO-256 8 AES rounds per cycle 260 kGates 13966 Mbit/s 291 MHz
Fugue-256 S-box as LUT 55 kGates 8815 Mbit/s 551 MHz
Grøstl-256 P and Q permutation interleaved with one pipeline stage, S-box as LUT 135 kGates 16254 Mbit/s 667 MHz
Hamsi-256 Message expansions in LUTs, one round per cycle 45 kGates 8686 Mbit/s 814 MHz
JH-256 S-boxes as LUTs, stored constants 80 kGates 10807 Mbit/s 760 MHz
Keccak(-256) One round per cycle 50 kGates 43011 Mbit/s 949 MHz
Luffa-256 Three parallel step modules, SubCrumb as logic 55 kGates 23256 Mbit/s 727 MHz
Shabal-256 30 adders, 16 subtractors 45 kGates 6819 Mbit/s 693 MHz
SHAvite-3256 One AES round each for message expansion and F3 round 75 kGates 7999 Mbit/s 562 MHz
SIMD-256 Four parallel Feistel modules, message expansion based on NNT8 and eight multipliers 135 kGates 5177 Mbit/s 364 MHz
Skein-256-256 Four unrolled Threefish rounds 50 kGates 3558 Mbit/s 264 MHz




5.8 All 14 Round-Two Candidates

Designs optimized towards throughput to area ratio. The cited results are those for the Xilinx Virtex 5 and Altera Stratix III platforms (both for the 256-bit and the 512-bit version of the candidates). Results marked with N/A did not fit into the largest device of the device family. For a full listing of all ATHENa results refer to the ATHENa webpage.


Reference HDL Category Impl. Scope Technology
Homsirikamol et al. [30] On request High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 4 G function units per iteration 1523 slices 3143 Mbit/s 128.9 MHz
BLAKE-64 4 G function units per iteration 3064 slices 3520 Mbit/s 99.7 MHz
Blue Midnight Wish-256 Fully unrolled 4353 slices 6141 Mbit/s 12.0 MHz
Blue Midnight Wish-512 Fully unrolled N/A N/A N/A
CubeHash16/32-256 684 slices 4385 Mbit/s 274.1 MHz
CubeHash16/32-512 734 slices 4315 Mbit/s 269.7 MHz
ECHO-256 3 clk cycles per round 4982 slices 11323 Mbit/s 184.3 MHz
ECHO-512 3 clk cycles per round 5044 slices 7779 Mbit/s 235.5 MHz
Fugue-256 2 clk cycles per round 708 slices 3495 Mbit/s 218.4 MHz
Fugue-512 4 clk cycles per round 979 slices 1773 Mbit/s 221.6 MHz
Grøstl-256 P & Q permutations interleaved 1597 slices 7885 Mbit/s 323.4 MHz
Grøstl-512 P & Q permutations interleaved 3138 slices 10314 Mbit/s 292.1 MHz
Hamsi-256 720 slices 3049 Mbit/s 285.9 MHz
Hamsi-512 1900 slices 1942 Mbit/s 182.1 MHz
JH-256 1018 slices 5416 Mbit/s 380.8 MHz
JH-512 1104 slices 5610 Mbit/s 394.5 MHz
Keccak(-256) 1272 slices 12817 Mbit/s 282.7 MHz
Keccak(-512) 1257 slices 6845 Mbit/s 285.2 MHz
Luffa-256 949 slices 9692 Mbit/s 340.7 MHz
Luffa-512 1960 slices 7691 Mbit/s 240.3 MHz
Shabal-256 32-bit datapath 283 slices 1719 Mbit/s 214.9 MHz
Shabal-512 32-bit datapath 283 slices 1719 Mbit/s 214.9 MHz
SHAvite-3256 3 clk cycles per round 1076 slices 3253 Mbit/s 235.1 MHz
SHAvite-3512 4 clk cycles per round 2090 slices 3841 Mbit/s 213.8 MHz
SIMD-256 4 SIMD steps unrolled 8922 slices 3123 Mbit/s 54.9 MHz
SIMD-512 4 SIMD steps unrolled 19639 slices 4938 Mbit/s 43.4 MHz
Skein-512-256 4 Threefish rounds unrolled 1621 slices 3178 Mbit/s 118.0 MHz
Skein-512-512 4 Threefish rounds unrolled 1716 slices 3209 Mbit/s 119.1 MHz




Reference HDL Category Impl. Scope Technology
Homsirikamol et al. [30] N/A High-speed FPGA Fully autonomous Altera Stratix III


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 4 G function units per iteration 3635 ALUTs 2901 Mbit/s 119.0 MHz
BLAKE-64 4 G function units per iteration 7086 ALUTs 3161 Mbit/s 89.5 MHz
Blue Midnight Wish-256 Fully unrolled 12619 ALUTs 6339 Mbit/s 12.4 MHz
Blue Midnight Wish-512 Fully unrolled 25192 ALUTs 9820 Mbit/s 9.6 MHz
CubeHash16/32-256 1922 ALUTs 3726 Mbit/s 232.9 MHz
CubeHash16/32-512 1930 ALUTs 3267 Mbit/s 204.2 MHz
ECHO-256 3 clk cycles per round 20723 ALUTs 14335 Mbit/s 233.3 MHz
ECHO-512 3 clk cycles per round 21187 ALUTs 8172 Mbit/s 247.4 MHz
Fugue-256 2 clk cycles per round 2397 ALUTs 3319 Mbit/s 207.4 MHz
Fugue-512 4 clk cycles per round 2783 ALUTs 1598 Mbit/s 199.8 MHz
Grøstl-256 P & Q permutations interleaved 6350 ALUTs 5380 Mbit/s 220.7 MHz
Grøstl-512 P & Q permutations interleaved 12355 ALUTs 7142 Mbit/s 202.3 MHz
Hamsi-256 2308 ALUTs 2997 Mbit/s 281.0 MHz
Hamsi-512 6401 ALUTs 2001 Mbit/s 187.6 MHz
JH-256 3525 ALUTs 5515 Mbit/s 387.8 MHz
JH-512 3709 ALUTs 5556 Mbit/s 390.6 MHz
Keccak(-256) 4213 ALUTs 12393 Mbit/s 273.4 MHz
Keccak(-512) 3979 ALUTs 7310 Mbit/s 304.6 MHz
Luffa-256 3032 ALUTs 8570 Mbit/s 301.3 MHz
Luffa-512 6891 ALUTs 8579 Mbit/s 268.1 MHz
Shabal-256 32-bit datapath 1744 ALUTs 877 Mbit/s 109.7 MHz
Shabal-512 32-bit datapath 1744 ALUTs 877 Mbit/s 109.7 MHz
SHAvite-3256 3 clk cycles per round 3042 ALUTs 3397 Mbit/s 245.5 MHz
SHAvite-3512 4 clk cycles per round 5619 ALUTs 4071 Mbit/s 226.6 MHz
SIMD-256 4 SIMD steps unrolled 25728 ALUTs 3123 Mbit/s 54.9 MHz
SIMD-512 4 SIMD steps unrolled 53623 ALUTs 5668 Mbit/s 49.8 MHz
Skein-512-256 4 Threefish rounds unrolled 4645 ALUTs 2503 Mbit/s 92.9 MHz
Skein-512-512 4 Threefish rounds unrolled 4794 ALUTs 2434 Mbit/s 90.3 MHz



5.9 All 14 Round-Two Candidates

Results are without wrapper for long messages.


Reference HDL Category Impl. Scope Technology
Baldwin et al. [31] UCC webpage High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 1118 slices 1169 Mbit/s 118.06 MHz
BLAKE-64 1718 slices 1299 Mbit/s 90.91 MHz
Blue Midnight Wish-256 4997 slices 457 Mbit/s 14.02 MHz
Blue Midnight Wish-512 9810 slices 287 Mbit/s 10 MHz
CubeHash8/32 695 slices 2509 Mbit/s 166.83 MHz
ECHO-256 7372 slices 5373 Mbit/s 198.93 MHz
ECHO-512 8633 slices 18133 Mbit/s 166.69 MHz
Fugue-256 1689 slices 914 Mbit/s 200.04 MHz
Fugue-384 2380 slices 640 Mbit/s 200.08 MHz
Fugue-512 2596 slices 481 Mbit/s 200.16 MHz
Grøstl-256 2391 slices 3242 Mbit/s 101.32 MHz
Grøstl-512 4845 slices 3619 Mbit/s 123.4 MHz
Hamsi-256 1518 slices 358 Mbit/s 72.41 MHz
Hamsi-512 6229 slices 79 Mbit/s 16.51 MHz
JH 1291 slices 1941 Mbit/s 250.13 MHz
Keccak(-224) 1117 slices 5915 Mbit/s 189 MHz
Keccak(-256) 1117 slices 6263 Mbit/s 189 MHz
Keccak(-384) 1117 slices 8190 Mbit/s 189 MHz
Keccak(-512) 1117 slices 8518 Mbit/s 189 MHz
Luffa-256 2221 slices 5333 Mbit/s 166.67 MHz
Luffa-384 3740 slices 5336 Mbit/s 166.75 MHz
Luffa-512 3700 slices 5336 Mbit/s 166.75 MHz
Shabal 1583 slices 1469 Mbit/s 148.04 MHz
SHAvite-3256 3125 slices 1170 Mbit/s 109.17 MHz
SHAvite-3512 9775 slices 931 Mbit/s 59.4 MHz
SIMD-256 22704 slices 1338 Mbit/s 107.2 MHz
SIMD-512 43729 slices 2677 Mbit/s 107.2 MHz
Skein-512 1786 slices 1945 Mbit/s 83.65 MHz




5.10 All 14 Round-Two Candidates

Results include throughputs without interface overhead.


Reference HDL Category Impl. Scope Technology
Matsuo et al. [33] RCIS webpage High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 1660 slices 2676 Mbit/s 115 MHz
Blue Midnight Wish-256 4350 slices 8704 Mbit/s 34 MHz
CubeHash16/32-256 590 slices 2960 Mbit/s 185 MHz
ECHO-256 2827 slices 2312 Mbit/s 149 MHz
Fugue-256 4013 slices 1248 Mbit/s 78 MHz
Grøstl-256 2616 slices 7885 Mbit/s 154 MHz
Hamsi-256 718 slices 1680 Mbit/s 210 MHz
JH-256 2661 slices 2639 Mbit/s 201 MHz
Keccak(-256) 1433 slices 8397 Mbit/s 205 MHz
Luffa-256 1048 slices 7424 Mbit/s 261 MHz
Shabal-256 1251 slices 2335 Mbit/s 228 MHz
SHAvite-3256 1063 slices 3382 Mbit/s 251 MHz
SIMD-256 3987 slices 835 Mbit/s 75 MHz
Skein-256-256 854 slices 1402 Mbit/s 115 MHz




Same implementations as in Matsuo et al. [33] implemented on STM 90 nm technology.


Reference HDL Category Impl. Scope Technology
RCIS webpage [37] RCIS webpage High-speed ASIC Fully autonomous STM 90 nm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 37 kGates 6668 Mbit/s 286.5 MHz
Blue Midnight Wish-256 128.7 kGates 25937 Mbit/s 101.3 MHz
CubeHash16/32-256 35.5 kGates 8247 Mbit/s 515.5 MHz
ECHO-256 101.1 kGates 5621 Mbit/s 362.3 MHz
Fugue-256 56.7 kGates 2721 Mbit/s 170.1 MHz
Grøstl-256 139.1 kGates 17297 Mbit/s 337.8 MHz
Hamsi-256 67.6 kGates 7767 Mbit/s 970.9 MHz
JH-256 54.6 kGates 10022 Mbit/s 763.4 MHz
Keccak(-256) 50.7 kGates 33333 Mbit/s 781.3 MHz
Luffa-256 39.6 kGates 28732 Mbit/s 1010.1 MHz
Shabal-256 34.6 kGates 6059 Mbit/s 591.7 MHz
SHAvite-3256 59.4 kGates 8421 Mbit/s 625 MHz
SIMD-256 139 kGates 3171 Mbit/s 284.9 MHz
Skein-256-256 43.1 kGates 3295 Mbit/s 270.3 MHz



5.11 Blue Midnight Wish, Keccak, Luffa

Reference HDL Category Impl. Scope Technology
Akin et al. [34] N/A High-speed FPGA Core functionality Xilinx Spartan 3


Hash Function Name Impl. Details Size Throughput Clock Frequency
Blue Midnight Wish Compression function with f0, f1, and f2 unrolled in sequence 10531 slices 2110 Mbit/s 4.22 MHz
Keccak One Keccak-f round per cycle 2024 slices 3460 Mbit/s 81.4 MHz
Luffa Three step modules 2956 slices 1480 Mbit/s 157.3 MHz




Reference HDL Category Impl. Scope Technology
Akin et al. [34] N/A High-speed FPGA Core functionality Xilinx Virtex-II


Hash Function Name Impl. Details Size Throughput Clock Frequency
Blue Midnight Wish Compression function with f0, f1, and f2 unrolled in sequence 10432 slices 3360 Mbit/s 6.71 MHz
Keccak One Keccak-f round per cycle 2024 slices 5810 Mbit/s 136.6 MHz
Luffa Three step modules 2952 slices 8370 Mbit/s 301.4 MHz




Reference HDL Category Impl. Scope Technology
Akin et al. [34] N/A High-speed FPGA Core functionality Xilinx Virtex 4


Hash Function Name Impl. Details Size Throughput Clock Frequency
Blue Midnight Wish Compression function with f0, f1, and f2 unrolled in sequence 10486 slices 4510 Mbit/s 9.01 MHz
Keccak One Keccak-f round per cycle 2024 slices 6070 Mbit/s 142.9 MHz
Luffa Three step modules 2989 slices 8560 Mbit/s 308.2 MHz




Reference HDL Category Impl. Scope Technology
Akin et al. [34] N/A High-speed ASIC Core functionality Synopsys 90 nm


Hash Function Name Impl. Details Size Throughput Clock Frequency
Blue Midnight Wish Compression function with f0, f1, and f2 unrolled in sequence 55.9 kGates 26320 Mbit/s 52.63 MHz
Keccak One Keccak-f round per cycle 10.5 kGates 19320 Mbit/s 454.5 MHz
Luffa Three step modules 11.5 kGates 21370 Mbit/s 769.2 MHz

5.12 All 14 Round-Two Candidates

Results are post-P&R and include throughputs without interface overhead.


Reference HDL Category Impl. Scope Technology
Guo et al. [35] VT webpage High-speed ASIC Fully autonomous UMC 0.13 µm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-32 43.52 kGates 4645 Mbit/s 200 MHz
Blue Midnight Wish-256 198.17 kGates 12220 Mbit/s 48 MHz
CubeHash16/32-256 38.18 kGates 4624 Mbit/s 289 MHz
ECHO-256 92.73 kGates 3366 Mbit/s 217 MHz
Fugue-256 91.09 kGates 2385 Mbit/s 149 MHz
Grøstl-256 110.11 kGates 9606 Mbit/s 188 MHz
Hamsi-256 29.94 kGates 3571 Mbit/s 446 MHz
JH-256 62.42 kGates 5128 Mbit/s 391 MHz
Keccak(-256) 47.43 kGates 15457 Mbit/s 377 MHz
Luffa-256 37.94 kGates 13943 Mbit/s 490 MHz
Shabal-256 49.44 kGates 2945 Mbit/s 362 MHz
SHAvite-3256 55.25 kGates 4599 Mbit/s 341 MHz
SIMD-256 139.55 kGates 2157 Mbit/s 194 MHz
Skein-256-256 40.9 kGates 1941 Mbit/s 159 MHz



6 References

[1] Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan. SHA-3 proposal BLAKE (version 1.3). Available online at http://131002.net/blake/blake.pdf.

[2] A. H. Namin and M. A. Hasan. Hardware Implementation of the Compression Function for Selected SHA-3 Candidates. Available online at http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html.

[3] Kazuyuki Kobayashi, Jun Ikegami, Shin'ichiro Matsuo, Kazuo Sakiyama, and Kazuo Ohta. Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII. IACR Eprint report 2010/010. Available online at http://eprint.iacr.org/2010/010.pdf.

[4] Brian Baldwin, Andrew Byrne, Mark Hamilton, Neil Hanley, Robert P. McEvoy, Weibo Pan, and William P. Marnane. FPGA Implementations of SHA-3 Candidates: CubeHash, Grøstl, LANE, Shabal and Spectral Hash. IACR Eprint report 2009/342. Available online at http://eprint.iacr.org/2009/342.pdf.

[5] Liang Lu, Maire O'Neil, and Earl Swartzlander. Hardware Evaluation of SHA-3 Hash Function Candidate ECHO. Presentation at the Clauce Shannon Institute Workshop on Coding and Cryptography 2009. Slides available online at http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf.

[6] Bernhard Jungk, Steffen Reith, and Jürgen Apfelbeck. On Optimized FPGA Implementations of the SHA-3 Candidate Grøstl. IACR Eprint report 2009/206. Available online at http://eprint.iacr.org/2009/206.pdf.

[7] Praveen Gauravaram, Lars R. Knudsen, Krystian Matusievicz, Florian Mendel, Christian Rechberger, Martin Schläffer, and Søren S. Thomsen. Grøstl - a SHA-3 candidate (October 31, 2008). Available online at http://www.groestl.info/Groestl.pdf.

[8] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles van Assche. KECCAK sponge function family main document (Version 1.2, April 23, 2009). Available online at http://keccak.noekeon.org/Keccak-main-1.2.pdf.

[9] Joachim Strömbergson. Implementation of the Keccak Hash Function in FPGA Devices. Available online at http://www.strombergson.com/files/Keccak_in_FPGAs.pdf.

[10] Romain Feron and Julien Francq. FPGA Implementation of Shabal: Our First Results (Version 2.0, February 19, 2010). Available online at http://www.shabal.com/wp-content/uploads/2010/03/FPGA-Implementation-of-Shabal-First-ResultsV2.0.pdf.

[11] Men Long. Implementing Skein Hash Function on Xilinx Virtex-5 FPGA Platform (Version 0.7, February 2, 2009). Available online at http://www.skein-hash.info/sites/default/files/skein_fpga.pdf.

[12] Stefan Tillich. Hardware Implementation of the SHA-3 Candidate Skein. IACR Eprint report 2009/159. Available online at http://eprint.iacr.org/2009/159.pdf.

[13] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. Compact Implementations of BLAKE-32 and BLAKE-64 on FPGA. IACR Eprint report 2010/173. Available online at http://eprint.iacr.org/2010/173.pdf.

[14] Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, Jörn-Marc Schmidt, and Alexander Szekely. High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein. IACR Eprint report 2009/510. Available online at http://eprint.iacr.org/2009/510.pdf.

[15] Shai Halevi, William E. Hall, and Charanjit S. Jutla. The Hash Function Fugue (October 30, 2008). Available online at http://domino.research.ibm.com/comm/research_projects.nsf/pages/fugue.index.html/$FILE/NIST-submission-Oct08-fugue.pdf.

[16] Junfeng Fan. Hardware Evaluation of The Hash Function Hamsi. Available online at http://homes.esat.kuleuven.be/~okucuk/hamsi/implementations.html.

[17] Miroslav Knezevic and Ingrid Verbeiwhede. Hardware Evaluation of the Luffa Hash Family. 4th Workshop on Embedded Systems Security 2009. Available online at http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf.

[18] Stefan Tillich, Martin Feldhofer, Wolfgang Issovits, Thomas Kern, Hermann Kureck, Michael Mühlberghuber, Georg Neubauer, Andreas Reiter, Armin Köfler, and Mathias Mayrhofer. Compact Hardware Implementations of the SHA-3 Candidates ARIRANG, BLAKE, Grøstl, and Skein. IACR Eprint report 2009/349. Available online at http://eprint.iacr.org/2009/349.pdf.

[19] Grøstl website. http://www.groestl.info/.

[20] Markus Bernet, Luca Henzen, Hubert Kaeslin, Norbert Felber, and Wolfgang Fichtner. Hardware Implementations of the SHA-3 Candidates Shabal and CubeHash. 52nd IEEE International Midwest Symposium on Circuits and Systems, 2009. Available online at http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043.

[21] Michel Kinsy and Richard Uhler. SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec. Available online at http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf.

[22] Bernhard Jungk and Steffen Reith. On FPGA-based implementations of Grøstl. IACR Eprint report 2010/260. Available online at http://eprint.iacr.org/2010/260.pdf.

[23] Jérémie Detrey, Pierre Gaudry, and Karim Khalfallah. A Low-Area yet Performant FPGA Implementation of Shabal. IACR Eprint report 2010/292. Available online at http://eprint.iacr.org/2010/292.pdf.

[24] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. A Compact FPGA Implementation of the SHA-3 Candidate ECHO. IACR Eprint report 2010/364. Available online at http://eprint.iacr.org/2010/364.pdf.

[25] Wim Ramakers and Hans Narinx. Implementation and evaluation of SHA-3 candidates on FPGA. Extended abstract of Master Thesis "Implementatie en Evaluatie van SHA-3-Kandidaten op FPGA" (Dutch). Extended abstract available online at http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf. Full thesis available online at http://ehash.iaik.tugraz.at/uploads/6/62/Ramakers_Narinx2010ECHO-Hamsi-Luffa_Thesis_DUTCH.pdf.

[26] Julien Francq and Céline Thuillet. Unfolding Method for Shabal on Virtex-5 FPGAs: Concrete Results. IACR Eprint report 2010/406. Available online at http://eprint.iacr.org/2010/406.pdf.

[27] Shugo Mikami, Nagamasa Mizushima, Setsuko Nakamura, and Dai Watanabe. A Compact Hardware Implementation of SHA-3 Candidate Luffa (version 20101105). Available online at http://www.sdl.hitachi.co.jp/crypto/luffa/ACompactHardwareImplementationOfSHA-3CandidateLuffa_20101105.pdf.

[28] Imed Mabrouk and Ryad Benadjila. ECHO webpage (hardware subpage). http://crypto.rd.francetelecom.com/ECHO/hard/.

[29] Luca Henzen, Pietro Gendotti, Patrice Guillet, Enrico Pargaetzi, Martin Zoller, and Frank K. Gürkaynak. Developing a Hardware Evaluation Method for SHA-3 Candidates. 12th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), 2010. Available online at http://www.springerlink.com/content/g0115v3272156r06/.

[30] Ekawat Homsirikamol, Marcin Rogawski, and Kris Gaj. Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs. IACR Eprint report 2010/445. Available online at http://eprint.iacr.org/2010/445.pdf.

[31] Brian Baldwin, Neil Hanley, Mark Hamilton, Liang Lu, Andrew Byrne, Maire O'Neill, and William P. Marnane. FPGA Implementations of the Round Two SHA-3 Candidates. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf.

[32] Mohamed El Hadedy, Martin Margala, Danilo Gligoroski, and Svein J. Knapskog. Resource-Efficient Implementation of Blue Midnight Wish-256 Hash Function on Xilinx FPGA Platform. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/El-Hadedy_SmallSizeFPGA-BMW256.pdf.

[33] Shin'ichiro Matsuo, Miroslav Knezevic, Patrick Schaumont, Ingrid Verbauwhede, Akashi Satoh, Kazuo Sakiyama, and Kazuo Ota. How Can We Conduct "Fair and Consistent" Hardware Evaluation for SHA-3 Candidate? Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf.

[34] Abdulkadir Akin, Aydin Aysu, Onur Can Ulusel, and Erkay Savas. Efficient Hardware Implementations of High Throughput SHA-3 Candidates Keccak, Luffa and Blue Midnight Wish for Single- and Multi-Message Hashing. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf.

[35] Xu Guo, Sinan Huang, Leyla Nazhandali, and Patrick Schaumont. Fair and Comprehensive Performance Evaluation of 14 Second Round SHA-3 ASIC Implementations. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf.

[36] Jesse Walker, Farhana Sheikh, Sanu K. Mathew, and Ram Krishnamurthy. A Skein-512 Hardware Implementation. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/WALKER_skein-intel-hwd.pdf.

[38] Akashi Satoh, Toshihiro Katashita, Takeshi Sugawara, Naofumi Homma, and Takafumi Aoki. Hardware Implementations of Hash Function Luffa. IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 2010. Available online at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5513102&tag=1.

[39] RCIS webpage (Other ASIC Implementations). http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html.

[40] Luca Henzen, Jean-Philippe Aumasson, Willi Meier, and Raphael C.-W. Phan. VLSI Characterization of the Cryptographic Hash Function BLAKE. IEEE T VLSI, 2010. Available online at http://131002.net/data/papers/HAMP10.pdf.

[41] Mohamed El Hadedy, Danilo Gligoroski, and Svein J. Knapskog. Single Core Implementation of Blue Midnight Wish Hash Function on VIRTEX 5 Platform. Available online at http://people.item.ntnu.no/~danilog/Hash/BMW-SecondRound/SmallSizeFPGA-BMWOct2010.pdf.

Personal tools