Difference between revisions of "SHA-3 Hardware Implementations"

From The ECRYPT Hash Function Website
m (Low-Area Implementations (ASIC): Added results from Satoh et al. [38])
m (Important Information)
 
(27 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
== Important Information ==
 
== Important Information ==
  
This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST. This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our [[#Call_for_Contributions|call for contributions]].
+
This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST (final round 3). This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our [[#Call_for_Contributions|call for contributions]].
  
A list of hardware implementations of the round 1 candidates can be found [[SHA-3_Hardware_Implementations_Round_One|here]]. Please note that the page for round 1 candidates is provided for reference and will not be updated.
+
A list of hardware implementations of the round 1 candidates can be found [[SHA-3_Hardware_Implementations_Round_One|here]]. A list of hardware implementations of the round 2 candidates is archived [[SHA-3_Hardware_Implementations_Round_Two|here]]. <font color=red> Please note that the pages for round 1 and 2 candidates are provided for reference and will not be updated. </font>
  
The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct comparisions between different hardware implementation difficult. The more of these parameters agree, the more reasonable the comparison becomes.  
+
The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct comparisons between different hardware implementations difficult. The more of these parameters agree, the more reasonable the comparison becomes.  
  
 
The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology).
 
The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology).
  
In order to facilitate the comparision of hardware modules with different implementation scopes, we classify them into three categories:
+
In order to facilitate the comparison of hardware modules with different implementation scopes, we classify them into three categories:
  
 
* [[#Fully_Autonomous_Implementation|Fully autonomous]]
 
* [[#Fully_Autonomous_Implementation|Fully autonomous]]
Line 41: Line 41:
  
 
Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations.
 
Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations.
 +
 +
== Tweaks of Round Three Candidates over Round Two ==
 +
 +
The main tweaks for round three consist of the adaption of round numbers for some of the candidates. For implementations of round 2 variants (cf. [[SHA-3_Hardware_Implementations_Round_Two|round two results]]), we extrapolated to the performance of round 3 variants. Extrapolated results are marked in <font color=orange> orange </font>. If the tweaks for an algorithm are expected to be negligible for performance (e.g. just a change of constants), we include the results for the round 2 variant verbatim.
 +
 +
* BLAKE: The round three versions of BLAKE have been renamed to BLAKE-224, BLAKE-256, BLAKE-384, and BLAKE-512. The number of rounds has been increased from 10 to 14 for BLAKE-224 and BLAKE-256, and from 14 to 16 for BLAKE-384 and BLAKE-512. Thus, throughput for BLAKE-224 and BLAKE-256 is expected to decrease by a factor of 10/14 (reduction by about 28.5%), and for BLAKE-384 and BLAKE-512 by a factor of 14/16 (reduction by 12.5%).
 +
* Grøstl: The shift distances for the Q permutation have been changed and the round constants for both P and Q permutation have been modified. The first is not expected to have an impact on hardware performance, whereas the latter is likely to increase overall hardware size and/or decrease throughput slightly.
 +
* JH: The number of rounds has been increased from 35.5 to 42. Thus, throughput of JH is expected to decrease by a factor of 35.5/42 (reduction by about 15.5%).
 +
* Keccak: The padding rule has been simplified and some parameters have been redefined. No significant impact on hardware performance is expected.
 +
* Skein: A single 64-bit constant has been changed. No significant impact on hardware performance is expected.
  
 
== Ongoing Hardware Benchmarking Efforts ==
 
== Ongoing Hardware Benchmarking Efforts ==
Line 52: Line 62:
 
=== High-Speed Implementations (FPGA) ===
 
=== High-Speed Implementations (FPGA) ===
  
Important note: The size and functionality of slices varies between FPGA families. A direct comparision of the slice count of implementations on different FPGA families is therefore problematic.
+
Important note: The size and functionality of slices varies between FPGA families. A direct comparison of the slice count of implementations on different FPGA families is therefore problematic.
  
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="120"| Hash Function Name  !! width="150"| Reference / HDL  !! width="100"| Impl. Scope  !! width="200"| Impl. Details  !! width="100"| Technology  !! width="80"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="120"| Hash Function Name  !! width="150"| Reference / HDL  !! width="100"| Impl. Scope  !! width="200"| Impl. Details  !! width="100"| Technology  !! width="80"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
 
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex-II Pro  || align="right"| 3091 slices  || align="right"| 1724 Mbit/s  || align="right"| 37.0 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex-II Pro  || align="right"| 3091 slices  || align="right"| 1231 Mbit/s  || align="right"| 37.0 MHz
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex 4  || align="right"| 3087 slices  || align="right"| 2235 Mbit/s  || align="right"| 48.0 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex 4  || align="right"| 3087 slices  || align="right"| 1596 Mbit/s  || align="right"| 48.0 MHz
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex 5 || align="right"| 1694 slices  || align="right"| 3103 Mbit/s  || align="right"| 67.0 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex 5 || align="right"| 1694 slices  || align="right"| 2216 Mbit/s  || align="right"| 67.0 MHz
| BLAKE-32 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units and I/O registers  || Altera Stratix III  || align="right"| 5435 ALUTs  || align="right"| 2186.2 Mbit/s  || align="right"| 46.97 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units and I/O registers  || Altera Stratix III  || align="right"| 5435 ALUTs  || align="right"| 1562 Mbit/s  || align="right"| 46.97 MHz
| BLAKE-32 || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 1660 slices  || align="right"| 2676 Mbit/s  || align="right"| 115 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 1660 slices  || align="right"| 1911 Mbit/s  || align="right"| 115 MHz
| BLAKE-32 || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5  || align="right"| 1851 slices  || align="right"| 2610.6 Mbit/s  || align="right"| 102 MHz
+
 
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 1118 slices || align="right"| 1169 Mbit/s  || align="right"| 118.06 MHz
+
| BLAKE-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 4 G function units per iteration  || Xilinx Virtex 5  || align="right"| 1523 slices  || align="right"| 2245 Mbit/s  || align="right"| 128.9 MHz
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1660 slices  || align="right"| 2676 Mbit/s  || align="right"| 115 MHz
+
| BLAKE-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 4 G function units per iteration  || Altera Stratix III || align="right"| 3635 ALUTs || align="right"| 2072 Mbit/s  || align="right"| 119.0 MHz
|-
+
 
| BLAKE-64 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units  || Xilinx Virtex-II Pro || align="right"| 11122 slices  || align="right"| 1177 Mbit/s  || align="right"| 17.0 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1118 slices  || align="right"| 835 Mbit/s  || align="right"| 118.06 MHz
| BLAKE-64 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex 4 || align="right"| 11483 slices  || align="right"| 1707 Mbit/s  || align="right"| 25.0 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 1660 slices  || align="right"| 1911 Mbit/s  || align="right"| 115 MHz
| BLAKE-64 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex 5 || align="right"| 4329 slices  || align="right"| 2389 Mbit/s  || align="right"| 35.0 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex-II Pro || align="right"| 11122 slices  || align="right"| 1030 Mbit/s  || align="right"| 17.0 MHz
| BLAKE-64 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 1718 slices  || align="right"| 1299 Mbit/s  || align="right"| 90.91 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex || align="right"| 11483 slices  || align="right"| 1494 Mbit/s  || align="right"| 25.0 MHz
| Blue Midnight Wish-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/|| [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with f0, f1, and f2 unrolled in sequence and I/O registers  || Altera Stratix III || align="right"| 12917 ALUTs || align="right"| 4889.6 Mbit/s  || align="right"| 9.55 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units  || Xilinx Virtex 5 || align="right"| 4329 slices  || align="right"| 2090 Mbit/s  || align="right"| 35.0 MHz
| Blue Midnight Wish-256 || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5  || align="right"| 4400 slices  || align="right"| 5576.7 Mbit/s  || align="right"| 10.9 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 1718 slices || align="right"| 1137 Mbit/s  || align="right"| 90.91 MHz
| Blue Midnight Wish-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 4997 slices || align="right"| 457 Mbit/s  || align="right"| 14.02 MHz
+
 
|-
+
|- style="background:#ffdf9f;"
| Blue Midnight Wish-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 4350 slices  || align="right"| 8704 Mbit/s  || align="right"| 34 MHz
+
| BLAKE-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 4 G function units per iteration  || Xilinx Virtex 5  || align="right"| 3064 slices  || align="right"| 3080 Mbit/s  || align="right"| 99.7 MHz
|-
+
|- style="background:#ffdf9f;"
| Blue Midnight Wish-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 9810 slices  || align="right"| 287 Mbit/s  || align="right"| 10 MHz
+
| BLAKE-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 4 G function units per iteration  || Altera Stratix III || align="right"| 7086 ALUTs || align="right"| 2766 Mbit/s  || align="right"| 89.5 MHz
|-
+
 
| Blue Midnight Wish  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with f0, f1, and f2 unrolled in sequence  || Xilinx Spartan 3  || align="right"| 10531 slices  || align="right"| 2110 Mbit/s  || align="right"| 4.22 MHz
+
|- style="background:#ffdf9f;"
|-
 
| Blue Midnight Wish  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with f0, f1, and f2 unrolled in sequence  || Xilinx Virtex-II  || align="right"| 10432 slices  || align="right"| 3360 Mbit/s  || align="right"| 6.71 MHz
 
|-
 
| Blue Midnight Wish  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with f0, f1, and f2 unrolled in sequence  || Xilinx Virtex 4  || align="right"| 10486 slices  || align="right"| 4510 Mbit/s  || align="right"| 9.01 MHz
 
|-
 
| CubeHash8/1-256(***) || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || 2 compression functions unrolled || Xilinx Spartan 3 || align="right"| 3268 slices  || align="right"| 70 Mbit/s  || align="right"| 37.9 MHz
 
|-
 
| CubeHash8/1-256(***) || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || 1 iterated compression function || Xilinx Virtex 5 || align="right"| 1178 slices  || align="right"| 160 Mbit/s  || align="right"| 166.8 MHz
 
|-
 
| CubeHash16/32-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 590 slices  || align="right"| 2960 Mbit/s  || align="right"| 185 MHz
 
|-
 
| CubeHash16/32-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 730 slices  || align="right"| 3189.8 Mbit/s  || align="right"| 199.4 MHz
 
|-
 
| CubeHash16/32-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 590 slices  || align="right"| 2960 Mbit/s  || align="right"| 185 MHz
 
|-
 
| CubeHash8/32  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 695 slices  || align="right"| 2509 Mbit/s  || align="right"| 166.83 MHz
 
|-
 
| ECHO-224/256  || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] [[#Ref005|[5]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 9333 slices  || align="right"| 14860 Mbit/s  || align="right"| 87.1 MHz
 
|-
 
| ECHO-224/256  || [http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf Kinsy and Uhler] [[#Ref021|[21]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 273 cycles per block  || Altera Cyclone II  || align="right"| 39091 LEs  || align="right"| 397 Mbit/s(*)  || align="right"| 70.6 MHz
 
|-
 
| ECHO-256  || [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]] / [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Straight-forward instantiation of complete compression function  || Xilinx Virtex 5  || align="right"| 15006 slices  || align="right"| 23860 Mbit/s  || align="right"| 139 MHz
 
|-
 
| ECHO-256  || [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]] / [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Optimized: 4 x 2 AES round instances with pipeline register in BigSubWords  || Xilinx Virtex 5  || align="right"| 12061 slices  || align="right"| 3560 Mbit/s  || align="right"| 187 MHz
 
|-
 
| ECHO-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 3556 slices  || align="right"| 1614 Mbit/s  || align="right"| 104 MHz
 
|-
 
| ECHO-256  || [http://crypto.rd.francetelecom.com/ECHO/hard/ Mabrouk and Benadjila] [[#Ref028|[28]]] / [http://crypto.rd.francetelecom.com/ECHO/hard/echo_highspeed_virtex5.zip Implementer's webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Fully parallel iterations of Compress512  || Xilinx Virtex 5  || align="right"| 10407 slices  || align="right"| 26390 Mbit/s  || align="right"| 154.6 MHz
 
|-
 
| ECHO-256  || [http://crypto.rd.francetelecom.com/ECHO/hard/ Mabrouk and Benadjila] [[#Ref028|[28]]] / [http://crypto.rd.francetelecom.com/ECHO/hard/echo_highspeed_virtex6.zip Implementer's webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Fully parallel iterations of Compress512  || Xilinx Virtex 6  || align="right"| 8071 slices  || align="right"| 29457 Mbit/s  || align="right"| 172.6 MHz
 
|-
 
| ECHO-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 6453 slices  || align="right"| 10133.4 Mbit/s  || align="right"| 178.1 MHz
 
|-
 
| ECHO-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 7372 slices  || align="right"| 5373 Mbit/s  || align="right"| 198.93 MHz
 
|-
 
| ECHO-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2827 slices  || align="right"| 2312 Mbit/s  || align="right"| 149 MHz
 
|-
 
| ECHO-384/512  || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] [[#Ref005|[5]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 9097 slices  || align="right"| 7810 Mbit/s  || align="right"| 83.9 MHz
 
|-
 
| ECHO-384/512  || [http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf Kinsy and Uhler] [[#Ref021|[21]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 341 cycles per block  || Altera Cyclone II  || align="right"| 39091 LEs  || align="right"| 212 Mbit/s(**)  || align="right"| 70.6 MHz
 
|-
 
| ECHO-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 8633 slices  || align="right"| 18133 Mbit/s  || align="right"| 166.69 MHz
 
|-
 
| Fugue-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 956 slices  || align="right"| 3151.2 Mbit/s  || align="right"| 98.5 MHz
 
|-
 
| Fugue-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1689 slices  || align="right"| 914 Mbit/s  || align="right"| 200.04 MHz
 
|-
 
| Fugue-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 4013 slices  || align="right"| 1248 Mbit/s  || align="right"| 78 MHz
 
|-
 
| Fugue-384  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2380 slices  || align="right"| 640 Mbit/s  || align="right"| 200.08 MHz
 
|-
 
| Fugue-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2596 slices  || align="right"| 481 Mbit/s  || align="right"| 200.16 MHz
 
|-
 
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Spartan 3  || align="right"| 6136 slices  || align="right"| 4520 Mbit/s  || align="right"| 88.3 MHz
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Spartan 3  || align="right"| 6136 slices  || align="right"| 4520 Mbit/s  || align="right"| 88.3 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-224/256  || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Virtex 5  || align="right"| 1722 slices  || align="right"| 10276 Mbit/s  || align="right"| 200.7 MHz
 
| Grøstl-224/256  || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Virtex 5  || align="right"| 1722 slices  || align="right"| 10276 Mbit/s  || align="right"| 200.7 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || P & Q permutation in parallel, S-box in BRAM  || Xilinx Spartan 3  || align="right"| 4827 slices  || align="right"| 3660 Mbit/s  || align="right"| 71.53 MHz
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || P & Q permutation in parallel, S-box in BRAM  || Xilinx Spartan 3  || align="right"| 4827 slices  || align="right"| 3660 Mbit/s  || align="right"| 71.53 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || P & Q permutation in parallel, S-box in BRAM  || Xilinx Virtex 5  || align="right"| 4516 slices  || align="right"| 7310 Mbit/s  || align="right"| 142.87 MHz
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || P & Q permutation in parallel, S-box in BRAM  || Xilinx Virtex 5  || align="right"| 4516 slices  || align="right"| 7310 Mbit/s  || align="right"| 142.87 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 4057 slices  || align="right"| 5171 Mbit/s  || align="right"| 101 MHz
 
| Grøstl-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 4057 slices  || align="right"| 5171 Mbit/s  || align="right"| 101 MHz
|-
+
 
| Grøstl-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5  || align="right"| 1884 slices  || align="right"| 8676.5 Mbit/s  || align="right"| 355.9 MHz
+
|- style="background:#ffdf9f;"
|-
+
| Grøstl-256  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutations interleaved  || Xilinx Virtex 5  || align="right"| 1597 slices  || align="right"| 7885 Mbit/s  || align="right"| 323.4 MHz
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-256  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutations interleaved  || Altera Stratix III  || align="right"| 6350 ALUTs  || align="right"| 5380 Mbit/s  || align="right"| 220.7 MHz
 +
 
 +
|- style="background:#ffdf9f;"
 
| Grøstl-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2391 slices  || align="right"| 3242 Mbit/s  || align="right"| 101.32 MHz
 
| Grøstl-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2391 slices  || align="right"| 3242 Mbit/s  || align="right"| 101.32 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2616 slices  || align="right"| 7885 Mbit/s  || align="right"| 154 MHz
 
| Grøstl-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2616 slices  || align="right"| 7885 Mbit/s  || align="right"| 154 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-384/512  || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Spartan 3  || align="right"| 20233 slices  || align="right"| 5901 Mbit/s  || align="right"| 80.7 MHz
 
| Grøstl-384/512  || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Spartan 3  || align="right"| 20233 slices  || align="right"| 5901 Mbit/s  || align="right"| 80.7 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-384/512  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || P & Q permutation parallel, S-box in LUTs  || Xilinx Spartan 3  || align="right"| 17452 slices  || align="right"| 3180 Mbit/s  || align="right"| 79.61 MHz
 
| Grøstl-384/512  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || P & Q permutation parallel, S-box in LUTs  || Xilinx Spartan 3  || align="right"| 17452 slices  || align="right"| 3180 Mbit/s  || align="right"| 79.61 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-384/512  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || P & Q permutation parallel, S-box in LUTs  || Xilinx Virtex 5  || align="right"| 19161 slices  || align="right"| 6090 Mbit/s  || align="right"| 83.33 MHz
 
| Grøstl-384/512  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || P & Q permutation parallel, S-box in LUTs  || Xilinx Virtex 5  || align="right"| 19161 slices  || align="right"| 6090 Mbit/s  || align="right"| 83.33 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-384/512  || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Virtex 5  || align="right"| 5419 slices  || align="right"| 15395 Mbit/s  || align="right"| 210.5 MHz
 
| Grøstl-384/512  || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Virtex 5  || align="right"| 5419 slices  || align="right"| 15395 Mbit/s  || align="right"| 210.5 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-384/512  || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Shared P & Q permutation  || Xilinx Spartan 3  || align="right"| 8308 slices  || align="right"| 3474 Mbit/s  || align="right"| 95 MHz
 
| Grøstl-384/512  || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Shared P & Q permutation  || Xilinx Spartan 3  || align="right"| 8308 slices  || align="right"| 3474 Mbit/s  || align="right"| 95 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 4845 slices  || align="right"| 3619 Mbit/s  || align="right"| 123.4 MHz
 
| Grøstl-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 4845 slices  || align="right"| 3619 Mbit/s  || align="right"| 123.4 MHz
|-
+
 
| Hamsi-256 || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5  || align="right"| 718 slices  || align="right"| 1680 Mbit/s  || align="right"| 210 MHz
+
|- style="background:#ffdf9f;"
|-
+
| Grøstl-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutations interleaved  || Xilinx Virtex 5  || align="right"| 3138 slices  || align="right"| 10314 Mbit/s  || align="right"| 292.1 MHz
| Hamsi-256 || [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]] / [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Straight-forward instantiation of complete compression function || Xilinx Virtex 5 || align="right"| 4664 slices || align="right"| 6620 Mbit/s  || align="right"| 207 MHz
+
|- style="background:#ffdf9f;"
|-
+
| Grøstl-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutations interleaved || Altera Stratix III || align="right"| 12355 ALUTs || align="right"| 7142 Mbit/s  || align="right"| 202.3 MHz
| Hamsi-256  || [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]] / [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Non-linear permutation block reused  || Xilinx Virtex 5  || align="right"| 2113 slices  || align="right"| 1970 Mbit/s  || align="right"| 308 MHz
+
 
|-
+
|- style="background:#ffdf9f;"
| Hamsi-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5  || align="right"| 946 slices  || align="right"| 2646.2 Mbit/s  || align="right"| 248.1 MHz
+
| JH-256  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || || Xilinx Virtex 5  || align="right"| 1018 slices  || align="right"| 4578 Mbit/s  || align="right"| 380.8 MHz
|-
+
|- style="background:#ffdf9f;"
| Hamsi-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 1518 slices || align="right"| 358 Mbit/s  || align="right"| 72.41 MHz
+
| JH-256  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || || Altera Stratix III || align="right"| 3525 ALUTs || align="right"| 4661 Mbit/s  || align="right"| 387.8 MHz
|-
+
 
| Hamsi-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 718 slices  || align="right"| 1680 Mbit/s  || align="right"| 210 MHz
+
|- style="background:#ffdf9f;"
|-
+
| JH-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2661 slices  || align="right"| 2231 Mbit/s  || align="right"| 201 MHz
| Hamsi-512 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 6229 slices  || align="right"| 79 Mbit/s  || align="right"| 16.51 MHz
+
|- style="background:#ffdf9f;"
|-
+
| JH || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1291 slices  || align="right"| 1641 Mbit/s  || align="right"| 250.13 MHz
| JH-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1275 slices  || align="right"| 4013.5 Mbit/s  || align="right"| 282.2 MHz
+
 
|-
+
|- style="background:#ffdf9f;"
| JH-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5  || align="right"| 2661 slices  || align="right"| 2639 Mbit/s  || align="right"| 201 MHz
+
| JH-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || || Xilinx Virtex 5  || align="right"| 1104 slices  || align="right"| 4742 Mbit/s  || align="right"| 394.5 MHz
|-
+
|- style="background:#ffdf9f;"
| JH  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 1291 slices || align="right"| 1941 Mbit/s  || align="right"| 250.13 MHz
+
| JH-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || || Altera Stratix III || align="right"| 3709 ALUTs || align="right"| 4696 Mbit/s  || align="right"| 390.6 MHz
 +
 
 
|-
 
|-
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Core (round function, state register) & IO buffer || Altera Cyclone III || align="right"| 5776 LEs  || align="right"| 7500 Mbit/s || align="right"| 133 MHz
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Core (round function, state register) & IO buffer || Altera Cyclone III || align="right"| 5776 LEs  || align="right"| 7500 Mbit/s || align="right"| 133 MHz
Line 203: Line 165:
 
|-
 
|-
 
| Keccak(-224)  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1117 slices  || align="right"| 5915 Mbit/s  || align="right"| 189 MHz
 
| Keccak(-224)  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1117 slices  || align="right"| 5915 Mbit/s  || align="right"| 189 MHz
 +
 +
|-
 +
| Keccak(-256)  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1272 slices  || align="right"| 12817 Mbit/s  || align="right"| 282.7 MHz
 
|-
 
|-
| Keccak(-256)  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 1229 slices || align="right"| 10806.5 Mbit/s  || align="right"| 238.4 MHz
+
| Keccak(-256)  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || || Altera Stratix III || align="right"| 4213 ALUTs || align="right"| 12393 Mbit/s  || align="right"| 273.4 MHz
 +
 
 
|-
 
|-
 
| Keccak(-256)  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1117 slices  || align="right"| 6263 Mbit/s  || align="right"| 189 MHz
 
| Keccak(-256)  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1117 slices  || align="right"| 6263 Mbit/s  || align="right"| 189 MHz
Line 213: Line 179:
 
|-
 
|-
 
| Keccak(-512)  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1117 slices  || align="right"| 8518 Mbit/s  || align="right"| 189 MHz
 
| Keccak(-512)  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1117 slices  || align="right"| 8518 Mbit/s  || align="right"| 189 MHz
 +
 +
|-
 +
| Keccak(-512)  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1257 slices  || align="right"| 6845 Mbit/s  || align="right"| 285.2 MHz
 +
|-
 +
| Keccak(-512)  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Altera Stratix III  || align="right"| 3979 ALUTs  || align="right"| 7310 Mbit/s  || align="right"| 304.6 MHz
 +
 
|-
 
|-
 
| Keccak  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || One Keccak-f round per cycle  || Xilinx Spartan 3  || align="right"| 2024 slices  || align="right"| 3460 Mbit/s  || align="right"| 81.4 MHz
 
| Keccak  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || One Keccak-f round per cycle  || Xilinx Spartan 3  || align="right"| 2024 slices  || align="right"| 3460 Mbit/s  || align="right"| 81.4 MHz
Line 219: Line 191:
 
|-
 
|-
 
| Keccak  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || One Keccak-f round per cycle  || Xilinx Virtex 4  || align="right"| 2024 slices  || align="right"| 6070 Mbit/s  || align="right"| 142.9 MHz
 
| Keccak  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || One Keccak-f round per cycle  || Xilinx Virtex 4  || align="right"| 2024 slices  || align="right"| 6070 Mbit/s  || align="right"| 142.9 MHz
|-
+
 
| Luffa-256  || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function (1 cycle latency) and I/O registers  || Altera Stratix III  || align="right"| 16552 ALUTs  || align="right"| 12042.2 Mbit/s  || align="right"| 47.04 MHz
 
|-
 
| Luffa-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 1048 slices  || align="right"| 6343 Mbit/s  || align="right"| 223 MHz
 
|-
 
| Luffa-256  || [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]] / [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || One step block reused for 8 rounds  || Xilinx Virtex 5  || align="right"| 9611 slices  || align="right"| 2303 Mbit/s  || align="right"| 179 MHz
 
|-
 
| Luffa-256  || [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]] / [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Straight-forward instantiation of complete compression function  || Xilinx Virtex 5  || align="right"| 9611 slices  || align="right"| 12290 Mbit/s  || align="right"| 48.2 MHz
 
|-
 
| Luffa-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1154 slices  || align="right"| 8008 Mbit/s  || align="right"| 281.5 MHz
 
|-
 
| Luffa-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 2221 slices  || align="right"| 5333 Mbit/s  || align="right"| 166.67 MHz
 
|-
 
| Luffa-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1048 slices  || align="right"| 7424 Mbit/s  || align="right"| 261 MHz
 
|-
 
| Luffa-384  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 3740 slices  || align="right"| 5336 Mbit/s  || align="right"| 166.75 MHz
 
|-
 
| Luffa-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 3700 slices  || align="right"| 5336 Mbit/s  || align="right"| 166.75 MHz
 
|-
 
| Luffa  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Three step modules  || Xilinx Spartan 3  || align="right"| 2956 slices  || align="right"| 1480 Mbit/s  || align="right"| 157.3 MHz
 
|-
 
| Luffa  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Three step modules  || Xilinx Virtex-II  || align="right"|2952  slices  || align="right"| 8370 Mbit/s  || align="right"| 301.4 MHz
 
|-
 
| Luffa  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Three step modules  || Xilinx Virtex 4  || align="right"| 2989 slices  || align="right"| 8560 Mbit/s  || align="right"| 308.2 MHz
 
|-
 
| Shabal  || [http://www.shabal.com/wp-content/plugins/download-monitor/download.php?id=FPGA-Implementation-of-Shabal-First-ResultsV2.0.pdf Feron and Francq] [[#Ref010|[10]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 36 adders in permutation  || Xilinx Virtex 5  || align="right"| 1171 slices  || align="right"| 2588 Mbit/s  || align="right"| 126 MHz
 
|-
 
| Shabal  || [http://eprint.iacr.org/2010/406.pdf Francq and Thuillet] [[#Ref026|[26]]] / [http://www.shabal.com/?p=170 Shabal webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 4 iterations of the permutation unrolled  || Xilinx Virtex 5  || align="right"| 1715 slices  || align="right"| 3242 Mbit/s  || align="right"| 76 MHz
 
|-
 
| Shabal  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || 36 adders in permutation  || Xilinx Spartan 3  || align="right"| 2223 slices  || align="right"| 740 Mbit/s  || align="right"| 71.48 MHz
 
|-
 
| Shabal  || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || 36 adders in permutation  || Xilinx Virtex 5  || align="right"| 2768 slices  || align="right"| 1450 Mbit/s  || align="right"| 138.87 MHz
 
|-
 
| Shabal  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1583 slices  || align="right"| 1469 Mbit/s  || align="right"| 148.04 MHz
 
|-
 
| Shabal-256  || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with I/O registers (latency of 16 clock cycles)  || Altera Stratix III  || align="right"| 1440 ALUTs  || align="right"| 3125.6 Mbit/s  || align="right"| 195.35 MHz
 
|-
 
| Shabal-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 1251 slices  || align="right"| 1739 Mbit/s  || align="right"| 214 MHz
 
|-
 
| Shabal-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1266 slices  || align="right"| 2624 Mbit/s  || align="right"| 128.1 MHz
 
|-
 
| Shabal-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1251 slices  || align="right"| 2335 Mbit/s  || align="right"| 228 MHz
 
|-
 
| Shabal-512  || [http://eprint.iacr.org/2010/292.pdf Detrey et al.] [[#Ref023|[23]]] / [http://hwshabal.gforge.inria.fr/ INRIA webpage (see SCM tree)]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Exploiting SRL16 primitive  || Xilinx Virtex 5  || align="right"| 153 slices  || align="right"| 2051 Mbit/s  || align="right"| 256 MHz
 
|-
 
| Shabal-512  || [http://eprint.iacr.org/2010/292.pdf Detrey et al.] [[#Ref023|[23]]] / [http://hwshabal.gforge.inria.fr/ INRIA webpage (see SCM tree)]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Exploiting SRL16 primitive  || Xilinx Spartan 3  || align="right"| 499 slices  || align="right"| 800 Mbit/s  || align="right"| 100 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1130 slices  || align="right"| 2885.9 Mbit/s  || align="right"| 208.6 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 3125 slices  || align="right"| 1170 Mbit/s  || align="right"| 109.17 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1063 slices  || align="right"| 3382 Mbit/s  || align="right"| 251 MHz
 
|-
 
| SHAvite-3<sub>512</sub>  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 9775 slices  || align="right"| 931 Mbit/s  || align="right"| 59.4 MHz
 
|-
 
| SIMD-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 9288 slices  || align="right"| 2325.9 Mbit/s  || align="right"| 40.9 MHz
 
|-
 
| SIMD-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 22704 slices  || align="right"| 1338 Mbit/s  || align="right"| 107.2 MHz
 
|-
 
| SIMD-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 3987 slices  || align="right"| 835 Mbit/s  || align="right"| 75 MHz
 
|-
 
| SIMD-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 43729 slices  || align="right"| 2677 Mbit/s  || align="right"| 107.2 MHz
 
 
|-
 
|-
 
| Skein-256-h || [http://www.skein-hash.info/sites/default/files/skein_fpga.pdf Men Long] [[#Ref011|[11]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || UBI component || Xilinx Virtex 5 || align="right"| 1001 slices  || align="right"| 408.7 Mbit/s || align="right"| 114.9 MHz
 
| Skein-256-h || [http://www.skein-hash.info/sites/default/files/skein_fpga.pdf Men Long] [[#Ref011|[11]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || UBI component || Xilinx Virtex 5 || align="right"| 1001 slices  || align="right"| 408.7 Mbit/s || align="right"| 114.9 MHz
Line 289: Line 200:
 
|-
 
|-
 
| Skein-256-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 854 slices  || align="right"| 1482 Mbit/s  || align="right"| 115 MHz
 
| Skein-256-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 854 slices  || align="right"| 1482 Mbit/s  || align="right"| 115 MHz
 +
 +
|-
 +
| Skein-512-256  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 4 Threefish rounds unrolled  || Xilinx Virtex 5  || align="right"| 1621 slices  || align="right"| 3178 Mbit/s  || align="right"| 118.0 MHz
 
|-
 
|-
| Skein-256-256  || [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5 || align="right"| 1312 slices || align="right"| 1416.1 Mbit/s  || align="right"| 49.8 MHz
+
| Skein-512-256  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 4 Threefish rounds unrolled  || Altera Stratix III || align="right"| 4645 ALUTs || align="right"| 2503 Mbit/s  || align="right"| 92.9 MHz
 +
 
 
|-
 
|-
 
| Skein-256-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 854 slices  || align="right"| 1402 Mbit/s  || align="right"| 115 MHz
 
| Skein-256-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 854 slices  || align="right"| 1402 Mbit/s  || align="right"| 115 MHz
Line 301: Line 216:
 
|-
 
|-
 
| Skein-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1786 slices  || align="right"| 1945 Mbit/s  || align="right"| 83.65 MHz
 
| Skein-512  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || Xilinx Virtex 5  || align="right"| 1786 slices  || align="right"| 1945 Mbit/s  || align="right"| 83.65 MHz
 +
 +
|-
 +
| Skein-512-512  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 4 Threefish rounds unrolled  || Xilinx Virtex 5  || align="right"| 1716 slices  || align="right"| 3209 Mbit/s  || align="right"| 119.1 MHz
 +
|-
 +
| Skein-512-512  || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  4 Threefish rounds unrolled  || Altera Stratix III  || align="right"| 4794 ALUTs  || align="right"| 2434 Mbit/s  || align="right"| 90.3 MHz
  
 
|}
 
|}
 
(*) Estimated peak throughput ignoring I/O bottleneck resulting from specific interface: (1536 bits/block) * (70.6 * 10^6 cycles/s) / (273 cycles/block) = 397.22 * 10^6 bits/s.
 
<br />
 
(**) Estimated peak throughput ignoring I/O bottleneck resulting from specific interface: (1024 bits/block) * (70.6 * 10^6 cycles/s) / (341 cycles/block) = 212.01 * 10^6 bits/s.
 
<br />
 
(***) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.
 
  
 
<br />
 
<br />
Line 318: Line 232:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="120"| Hash Function Name  !! width="150"| Reference / HDL  !! width="100"| Impl. Scope  !! width="200"| Implementation Details  !! width="100"| Technology  !! width="80"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="120"| Hash Function Name  !! width="150"| Reference / HDL  !! width="100"| Impl. Scope  !! width="200"| Implementation Details  !! width="100"| Technology  !! width="80"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 +
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Spartan-3  || align="right"| 124 slices  || align="right"| 82 Mbit/s  || align="right"| 190.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Virtex-4  || align="right"| 124 slices  || align="right"| 154 Mbit/s  || align="right"| 357.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Virtex-5  || align="right"| 56 slices  || align="right"| 161 Mbit/s  || align="right"| 372.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Altera Cyclone III  || align="right"| 285 LEs  || align="right"| 83 Mbit/s  || align="right"| 192.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex-II Pro  || align="right"| 958 slices  || align="right"| 265 Mbit/s  || align="right"| 59.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex 4  || align="right"| 960 slices  || align="right"| 307 Mbit/s  || align="right"| 68.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex 5 || align="right"| 390 slices  || align="right"| 411 Mbit/s  || align="right"| 91.0 MHz
 +
 
|-
 
|-
| BLAKE-32 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Spartan-3  || align="right"| 124 slices  || align="right"| 115 Mbit/s  || align="right"| 190.0 MHz
+
| BLAKE-256  || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rescheduled G function  || Xilinx Virtex 6 || align="right"| 117 slices  || align="right"| 105 Mbit/s  || align="right"| 274.0 MHz
 +
 
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Spartan-3  || align="right"| 229 slices  || align="right"| 121 Mbit/s  || align="right"| 158.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Virtex-4  || align="right"| 230 slices  || align="right"| 192 Mbit/s  || align="right"| 250.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Virtex-5  || align="right"| 108 slices  || align="right"| 275 Mbit/s  || align="right"| 358.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Altera Cyclone III  || align="right"| 542 LEs  || align="right"| 108 Mbit/s  || align="right"| 140.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex-II Pro  || align="right"| 1802 slices  || align="right"| 285 Mbit/s  || align="right"| 36.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex 4  || align="right"| 1856 slices  || align="right"| 333 Mbit/s  || align="right"| 42.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex 5 || align="right"| 939 slices  || align="right"| 466 Mbit/s  || align="right"| 59.0 MHz
 +
 
 
|-
 
|-
| BLAKE-32 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rescheduled G function  || Xilinx Virtex-4  || align="right"| 124 slices  || align="right"| 216 Mbit/s  || align="right"| 357.0 MHz
+
| BLAKE-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rescheduled G function  || Xilinx Virtex 6 || align="right"| 192 slices  || align="right"| 183 Mbit/s  || align="right"| 240.0 MHz
 
|-
 
|-
| BLAKE-32 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function || Xilinx Virtex-5 || align="right"| 56 slices  || align="right"| 225 Mbit/s  || align="right"| 372.0 MHz
+
| BLAKE-512  || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rescheduled G function  || Xilinx Spartan 6 || align="right"| 230 slices  || align="right"| 103 Mbit/s  || align="right"| 135.0 MHz
 +
 
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-224/256 || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath, P & Q permutation in parallel || Xilinx Spartan 3 || align="right"| 2486 slices  || align="right"| 404 Mbit/s  || align="right"| 63.2 MHz
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-224/256  || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath, P & Q permutation in parallel || Xilinx Virtex 2 Pro  || align="right"| 2754 slices  || align="right"| 512 Mbit/s  || align="right"| 81.5 MHz
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-224/256  || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Shared P & Q permutation, S-Box based on composite field arithmetic  || Xilinx Spartan 3 || align="right"| 1276 slices  || align="right"| 192 Mbit/s  || align="right"| 60 MHz
 +
 
 
|-
 
|-
| BLAKE-32 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rescheduled G function || Altera Cyclone III || align="right"| 285 LEs || align="right"| 116 Mbit/s  || align="right"| 192.0 MHz
+
| Grøstl-256  || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Interleaved P & Q permutations  || Xilinx Virtex 6 || align="right"| 260 slices  || align="right"| 815 Mbit/s  || align="right"| 280.0 MHz
 +
 
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-384/512 || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Shared P & Q permutation, S-Box based on composite field arithmetic || Xilinx Spartan 3 || align="right"| 2110 slices || align="right"| 144 Mbit/s  || align="right"| 63 MHz
 +
 
 
|-
 
|-
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 1 G function unit || Xilinx Virtex-II Pro  || align="right"| 958 slices  || align="right"| 371 Mbit/s  || align="right"| 59.0 MHz
+
| Grøstl-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Interleaved P & Q permutations || Xilinx Virtex 6 || align="right"| 260 slices  || align="right"| 640 Mbit/s  || align="right"| 280.0 MHz
 
|-
 
|-
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 1 G function unit || Xilinx Virtex 4  || align="right"| 960 slices  || align="right"| 430 Mbit/s  || align="right"| 68.0 MHz
+
| Grøstl-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Interleaved P & Q permutations || Xilinx Spartan 6 || align="right"| 343 slices  || align="right"| 548 Mbit/s  || align="right"| 240.0 MHz
 +
 
 
|-
 
|-
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 1 G function unit || Xilinx Virtex 5 || align="right"| 390 slices  || align="right"| 575 Mbit/s  || align="right"| 91.0 MHz
+
| JH-256 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath & distributed RAMs || Xilinx Virtex 6 || align="right"| 240 slices  || align="right"| 214 Mbit/s  || align="right"| 288.0 MHz
 +
 
 
|-
 
|-
| BLAKE-64 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rescheduled G function || Xilinx Spartan-3  || align="right"| 229 slices  || align="right"| 138 Mbit/s  || align="right"| 158.0 MHz
+
| JH-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath & distributed RAMs || Xilinx Virtex 6 || align="right"| 240 slices  || align="right"| 214 Mbit/s  || align="right"| 288.0 MHz
 
|-
 
|-
| BLAKE-64 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Virtex-4  || align="right"| 230 slices  || align="right"| 219 Mbit/s  || align="right"| 250.0 MHz
+
| JH-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath & distributed RAMs  || Xilinx Spartan 6 || align="right"| 260 slices  || align="right"| 84 Mbit/s  || align="right"| 113.0 MHz
|-
+
 
| BLAKE-64  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Xilinx Virtex-5  || align="right"| 108 slices  || align="right"| 314 Mbit/s  || align="right"| 358.0 MHz
 
|-
 
| BLAKE-64  || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Rescheduled G function  || Altera Cyclone III  || align="right"| 542 LEs  || align="right"| 123 Mbit/s  || align="right"| 140.0 MHz
 
|-
 
| BLAKE-64  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex-II Pro  || align="right"| 1802 slices  || align="right"| 326 Mbit/s  || align="right"| 36.0 MHz
 
|-
 
| BLAKE-64  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex 4  || align="right"| 1856 slices  || align="right"| 381 Mbit/s  || align="right"| 42.0 MHz
 
|-
 
| BLAKE-64  || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 1 G function unit  || Xilinx Virtex 5 || align="right"| 939 slices  || align="right"| 533 Mbit/s  || align="right"| 59.0 MHz
 
|-
 
| Blue Midnight Wish-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/El-Hadedy_SmallSizeFPGA-BMW256.pdf El Hadedy et al.] [[#Ref032|[32]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  32-bit datapath, 1 memory block  || Xilinx Virtex  || align="right"| 895 slices  || align="right"| 9 Mbit/s  || align="right"| 38 MHz
 
|-
 
| Blue Midnight Wish-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/El-Hadedy_SmallSizeFPGA-BMW256.pdf El Hadedy et al.] [[#Ref032|[32]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  32-bit datapath, 2 memory blocks  || Xilinx Virtex 5  || align="right"| 84 slices  || align="right"| 28 Mbit/s  || align="right"| 116 MHz
 
|-
 
| ECHO  || [http://eprint.iacr.org/2010/364.pdf Beuchat et al.] [[#Ref024|[24]]] / On request from author  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  Adapted towards FPGA implementation (127 slices and 1 memory block)  || Xilinx Virtex 5 || align="right"| 127 slices  || align="right"| 72 Mbit/s  || align="right"| 352.0 MHz
 
|-
 
| ECHO  || Announced 19-08-2010 on hash-forum@nist.gov / On request from author  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  All ECHO + all AES variants  || Xilinx Virtex 5 || align="right"| 231 slices  || align="right"| 81.7 Mbit/s (ECHO-224/256), 41.9 Mbit/s (ECHO-384/512) || align="right"| 351.0 MHz
 
|-
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath, P & Q permutation in parallel || Xilinx Spartan 3  || align="right"| 2486 slices  || align="right"| 404 Mbit/s  || align="right"| 63.2 MHz
 
|-
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath, P & Q permutation in parallel || Xilinx Virtex 2 Pro  || align="right"| 2754 slices  || align="right"| 512 Mbit/s  || align="right"| 81.5 MHz
 
|-
 
| Grøstl-224/256  || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Shared P & Q permutation, S-Box based on composite field arithmetic  || Xilinx Spartan 3  || align="right"| 1276 slices  || align="right"| 192 Mbit/s  || align="right"| 60 MHz
 
|-
 
| Grøstl-384/512  || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Shared P & Q permutation, S-Box based on composite field arithmetic  || Xilinx Spartan 3  || align="right"| 2110 slices  || align="right"| 144 Mbit/s  || align="right"| 63 MHz
 
 
|-
 
|-
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || Small core using system memory || Altera Stratix III  || align="right"| 855 ALUTs  || align="right"| 96.8 Mbit/s  || align="right"| 366 MHz
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || Small core using system memory || Altera Stratix III  || align="right"| 855 ALUTs  || align="right"| 96.8 Mbit/s  || align="right"| 366 MHz
Line 368: Line 303:
 
|-
 
|-
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || Small core using system memory || Xilinx Virtex 5  || align="right"| 444 slices  || align="right"| 70.1 Mbit/s  || align="right"| 265 MHz
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || Small core using system memory || Xilinx Virtex 5  || align="right"| 444 slices  || align="right"| 70.1 Mbit/s  || align="right"| 265 MHz
 +
 
|-
 
|-
| Shabal || [http://ehash.iaik.tugraz.at/uploads/d/d4/FPGA_Implementation_of_Shabal_-_First_Results.pdf Feron and Francq] [[#Ref010|[10]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 36 adders in permutation || Xilinx Virtex || align="right"| 596 slices (+ 40 DSP blocks) || align="right"| 1142 Mbit/s  || align="right"| 109 MHz
+
| Keccak(-256) || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rho transformation performed with a barrel rotator || Xilinx Virtex 6 || align="right"| 144 slices || align="right"| 128 Mbit/s  || align="right"| 250.0 MHz
 +
 
 
|-
 
|-
| Shabal || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || 1 adder in permutation || Xilinx Spartan 3  || align="right"| 1933 slices  || align="right"| 540 Mbit/s  || align="right"| 89.71 MHz
+
| Keccak(-512) || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rho transformation performed with a barrel rotator || Xilinx Virtex 6 || align="right"| 144 slices  || align="right"| 68 Mbit/s  || align="right"| 250.0 MHz
 
|-
 
|-
| Shabal || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || 1 adder in permutation || Xilinx Virtex 5  || align="right"| 2307 slices  || align="right"| 1330 Mbit/s  || align="right"| 222.22 MHz
+
| Keccak(-512) || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Rho transformation performed with a barrel rotator || Xilinx Spartan 6 || align="right"| 193 slices  || align="right"| 45 Mbit/s  || align="right"| 166.0 MHz
 +
 
 
|-
 
|-
| Shabal-512 || [http://eprint.iacr.org/2010/292.pdf Detrey et al.] [[#Ref023|[23]]] / [http://hwshabal.gforge.inria.fr/ INRIA webpage (see SCM tree)] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Exploiting SRL16 primitive || Xilinx Virtex 5 || align="right"| 153 slices || align="right"| 2051 Mbit/s  || align="right"| 256 MHz
+
| Skein-256-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]]  || One round of Threefish iterated || Altera Stratix III || align="right"| 1385 ALUTs || align="right"| 573.9 Mbit/s  || align="right"| 161.42 MHz
 +
 
 
|-
 
|-
| Shabal-512  || [http://eprint.iacr.org/2010/292.pdf Detrey et al.] [[#Ref023|[23]]] / [http://hwshabal.gforge.inria.fr/ INRIA webpage (see SCM tree)] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Exploiting SRL16 primitive || Xilinx Spartan 3  || align="right"| 499 slices  || align="right"| 800 Mbit/s  || align="right"| 100 MHz
+
| Skein-512-256 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One round of Threefish iterated || Xilinx Virtex 6 || align="right"| 240 slices  || align="right"| 179 Mbit/s  || align="right"| 160.0 MHz
 +
 
 
|-
 
|-
| Skein-256-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || One round of Threefish iterated  || Altera Stratix III || align="right"| 1385 ALUTs || align="right"| 573.9 Mbit/s  || align="right"| 161.42 MHz
+
| Skein-512-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One round of Threefish iterated  || Xilinx Virtex 6 || align="right"| 240 slices || align="right"| 179 Mbit/s || align="right"| 160.0 MHz
 +
|-
 +
| Skein-512-512  || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One round of Threefish iterated  || Xilinx Spartan 6 || align="right"| 292 slices  || align="right"| 102 Mbit/s  || align="right"| 91.0 MHz
 +
 
 
|}
 
|}
  
Line 385: Line 328:
  
 
=== High-Speed Implementations (ASIC) ===
 
=== High-Speed Implementations (ASIC) ===
 
A comparison of implementations of all 14 round 2 candidates has been presented informally at [http://www.iaik.tugraz.at/ IAIK] (Graz University of Technology) on Sept. 16, 2009. The updated presentation slides can be found [http://ehash.iaik.tugraz.at/uploads/f/fc/20091112_SHA-3_HW_stillich.pdf here].
 
  
 
<br />
 
<br />
Line 393: Line 334:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="120"| Hash Function Name  !! width="150"| Reference / HDL  !! width="100"| Impl. Scope  !! width="200"| Implementation Details  !! width="100"| Technology  !! width="80"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="120"| Hash Function Name  !! width="150"| Reference / HDL  !! width="100"| Impl. Scope  !! width="200"| Implementation Details  !! width="100"| Technology  !! width="80"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
 
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units || UMC 0.18 µm  || align="right"| 58.30 kGates  || align="right"| 5295 Mbit/s  || align="right"| 114 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units || UMC 0.18 µm  || align="right"| 58.30 kGates  || align="right"| 3782 Mbit/s  || align="right"| 114 MHz
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 4 G function units || UMC 0.18 µm  || align="right"| 41.31 kGates  || align="right"| 4153 Mbit/s  || align="right"| 170 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 4 G function units || UMC 0.18 µm  || align="right"| 41.31 kGates  || align="right"| 2966 Mbit/s  || align="right"| 170 MHz
| BLAKE-32 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units and I/O registers  || STM 90 nm  || align="right"| 53 kGates  || align="right"| 4475 Mbit/s(*)  || align="right"| 96.15 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units and I/O registers  || STM 90 nm  || align="right"| 53 kGates  || align="right"| 3196 Mbit/s(*)  || align="right"| 96.15 MHz
| BLAKE-32 || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 4 G function units with CSAs  || UMC 0.18 µm  || align="right"| 45.64 kGates  || align="right"| 3971 Mbit/s  || align="right"| 170.64 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 4 G function units with CSAs  || UMC 0.18 µm  || align="right"| 45.64 kGates  || align="right"| 2836 Mbit/s  || align="right"| 170.64 MHz
| BLAKE-32 || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four parallel G functions modules  || UMC 90 nm  || align="right"| 47.5 kGates  || align="right"| 9752 Mbit/s  || align="right"| 400 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four parallel G functions modules  || UMC 90 nm  || align="right"| 47.5 kGates  || align="right"| 6966 Mbit/s  || align="right"| 400 MHz
| BLAKE-32 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 43.52 kGates  || align="right"| 4645 Mbit/s  || align="right"| 200 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage]   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 43.52 kGates  || align="right"| 3318 Mbit/s  || align="right"| 200 MHz
| BLAKE-32 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 37 kGates  || align="right"| 6668 Mbit/s  || align="right"| 286.5 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 37 kGates  || align="right"| 4763 Mbit/s  || align="right"| 286.5 MHz
| BLAKE-64 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units || UMC 0.18 µm  || align="right"| 132.47 kGates  || align="right"| 5910 Mbit/s  || align="right"| 87 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 8 G function units || UMC 0.18 µm  || align="right"| 79 kGates  || align="right"| 4548 Mbit/s  || align="right"| 137 MHz
| BLAKE-64 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 4 G function units || UMC 0.18 µm  || align="right"| 82.73 kGates  || align="right"| 4810 Mbit/s  || align="right"| 136 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 4 G function units || UMC 0.18 µm  || align="right"| 48 kGates  || align="right"| 4176 Mbit/s  || align="right"| 240 MHz
| Blue Midnight Wish-256  || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with f0, f1, and f2 unrolled in sequence and I/O registers || STM 90 nm || align="right"| 164 kGates  || align="right"| 26665 Mbit/s(*) || align="right"| 52.08 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256  || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 8 G function units || UMC 0.13 µm || align="right"| 67 kGates  || align="right"| 6689 Mbit/s  || align="right"| 201 MHz
| Blue Midnight Wish-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with f0, f1, and f2 unrolled || UMC 0.18 µm  || align="right"| 169.74 kGates  || align="right"| 5358 Mbit/s  || align="right"| 10.46 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256  || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 4 G function units || UMC 0.13 µm  || align="right"| 43 kGates  || align="right"| 5748 Mbit/s  || align="right"| 330 MHz
| Blue Midnight Wish-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || single-cycle f0 and f2, f1 iteratively || UMC 90 nm  || align="right"| 150 kGates  || align="right"| 8486 Mbit/s  || align="right"| 298 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256  || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 8 G function units || UMC 90 nm  || align="right"| 65 kGates  || align="right"| 12499 Mbit/s  || align="right"| 376 MHz
| Blue Midnight Wish-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || UMC 0.13 µm || align="right"| 198.17 kGates  || align="right"| 12220 Mbit/s  || align="right"| 48 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256  || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 4 G function units  || UMC 90 nm || align="right"| 38 kGates  || align="right"| 10816 Mbit/s  || align="right"| 621 MHz
| Blue Midnight Wish || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with f0, f1, and f2 unrolled in sequence  || Synopsys 90 nm || align="right"| 55.9 kGates  || align="right"| 26320 Mbit/s  || align="right"| 52.63 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units || UMC 0.18 µm || align="right"| 132.47 kGates  || align="right"| 5171 Mbit/s  || align="right"| 87 MHz
| Blue Midnight Wish-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || STM 90 nm || align="right"| 128.7 kGates  || align="right"| 25937 Mbit/s  || align="right"| 101.3 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 4 G function units || UMC 0.18 µm || align="right"| 82.73 kGates  || align="right"| 4209 Mbit/s  || align="right"| 136 MHz
| CubeHash16/32-h || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Dynamically reconfigurable r and b parameters, two rounds unrolled || UMC 0.18 µm  || align="right"| 58.87 kGates  || align="right"| 4665 Mbit/s  || align="right"| 145.77 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 8 G function units || UMC 0.18 µm  || align="right"| 147 kGates  || align="right"| 6314 Mbit/s  || align="right"| 106 MHz
| CubeHash16/32-h || [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043 Bernet et al.] [[#Ref020|[20]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One round per cycle || 0.13 µm  || align="right"| 34.33 kGates  || align="right"| 9248 Mbit/s(***) || align="right"| 578 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 4 G function units || UMC 0.18 µm  || align="right"| 98 kGates  || align="right"| 6293 Mbit/s  || align="right"| 204 MHz
| CubeHash16/32-h || [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043 Bernet et al.] [[#Ref020|[20]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Half a round per cycle || 0.13 µm  || align="right"| 21.54 kGates  || align="right"| 8000 Mbit/s(***) || align="right"| 1000 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 8 G function units || UMC 0.13 µm  || align="right"| 139 kGates  || align="right"| 9452 Mbit/s  || align="right"| 158 MHz
| CubeHash16/32-256 || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One round per cycle, IV fixed || UMC 90 nm || align="right"| 42.5 kGates  || align="right"| 10667 Mbit/s  || align="right"| 667 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 4 G function units || UMC 0.13 µm || align="right"| 92 kGates  || align="right"| 8982 Mbit/s  || align="right"| 291 MHz
| CubeHash16/32-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/|| [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || UMC 0.13 µm || align="right"| 38.18 kGates  || align="right"| 4624 Mbit/s  || align="right"| 289 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 8 G function units  || UMC 90 nm || align="right"| 128 kGates  || align="right"| 17777 Mbit/s  || align="right"| 298 MHz
| CubeHash16/32-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || STM 90 nm  || align="right"| 35.5 kGates  || align="right"| 8247 Mbit/s  || align="right"| 515.5 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Compression function with 4 G function units  || UMC 90 nm  || align="right"| 79 kGates  || align="right"| 16434 Mbit/s  || align="right"| 532 MHz
| ECHO-224/256  || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] [[#Ref005|[5]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  ||  0.13 µm || align="right"| 521.1 kGates  || align="right"| 14850 Mbit/s  || align="right"| 87.1 MHz
+
 
|-
+
|- style="background:#ffdf9f;"
| ECHO-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four parallel AES rounds, 16 AES MixColumns 32-bit column multipliers  || UMC 0.18 µm  || align="right"| 141.49 kGates  || align="right"| 2246 Mbit/s  || align="right"| 141.84 MHz
 
|-
 
| ECHO-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 8 AES rounds per cycle  || UMC 90 nm  || align="right"| 260 kGates  || align="right"| 13966 Mbit/s  || align="right"| 291 MHz
 
|-
 
| ECHO-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 92.73 kGates  || align="right"| 3366 Mbit/s  || align="right"| 217 MHz
 
|-
 
| ECHO-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 101.1 kGates  || align="right"| 5621 Mbit/s  || align="right"| 362.3 MHz
 
|-
 
| ECHO-384/512  || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] [[#Ref005|[5]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  ||  0.13 µm|| align="right"| 516.8 kGates  || align="right"| 7750 Mbit/s  || align="right"| 83.3 MHz
 
|-
 
| Fugue-256 || [http://domino.research.ibm.com/comm/research_projects.nsf/pages/fugue.index.html/$FILE/NIST-submission-Oct08-fugue.pdf Submission doc.] [[#Ref015|[15]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four columns of SMIX transformation in parallel (SUPER4_P) || IBM 90 nm || align="right"| 109.85 kGates  || align="right"| 13913 Mbit/s  || align="right"| 869.5 MHz
 
|-
 
| Fugue-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four columns of SMIX transformation in parallel  || UMC 0.18 µm  || align="right"| 46.26 kGates  || align="right"| 4092 Mbit/s  || align="right"| 255.75 MHz
 
|-
 
| Fugue-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || S-box as LUT  || UMC 90 nm  || align="right"| 55 kGates  || align="right"| 8815 Mbit/s  || align="right"| 551 MHz
 
|-
 
| Fugue-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 91.09 kGates  || align="right"| 2385 Mbit/s  || align="right"| 149 MHz
 
|-
 
| Fugue-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 56.7 kGates  || align="right"| 2721 Mbit/s  || align="right"| 170.1 MHz
 
|-
 
 
| Grøstl-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One shared permutation for P & Q, one pipeline stage  || UMC 0.18 µm  || align="right"| 58.40 kGates  || align="right"| 6290 Mbit/s  || align="right"| 270.27 MHz
 
| Grøstl-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One shared permutation for P & Q, one pipeline stage  || UMC 0.18 µm  || align="right"| 58.40 kGates  || align="right"| 6290 Mbit/s  || align="right"| 270.27 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P and Q permutation interleaved with one pipeline stage, S-box as LUT  || UMC 90 nm  || align="right"| 135 kGates  || align="right"| 16254 Mbit/s  || align="right"| 667 MHz
 
| Grøstl-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P and Q permutation interleaved with one pipeline stage, S-box as LUT  || UMC 90 nm  || align="right"| 135 kGates  || align="right"| 16254 Mbit/s  || align="right"| 667 MHz
|-
+
|- style="background:#ffdf9f;"
| Grøstl-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 110.11 kGates  || align="right"| 9606 Mbit/s  || align="right"| 188 MHz
+
| Grøstl-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage]   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 110.11 kGates  || align="right"| 9606 Mbit/s  || align="right"| 188 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 139.1 kGates  || align="right"| 17297 Mbit/s  || align="right"| 337.8 MHz
 
| Grøstl-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 139.1 kGates  || align="right"| 17297 Mbit/s  || align="right"| 337.8 MHz
|-
+
 
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] [[#Ref039|[39]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 120.8 kGates  || align="right"| 16275 Mbit/s  || align="right"| 349.7 MHz
 +
 
 +
|- style="background:#ffdf9f;"
 
| Grøstl-384/512  || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || UMC 0.18 µm  || align="right"| 341 kGates  || align="right"| 6225 Mbit/s  || align="right"| 85.1 MHz
 
| Grøstl-384/512  || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || UMC 0.18 µm  || align="right"| 341 kGates  || align="right"| 6225 Mbit/s  || align="right"| 85.1 MHz
|-
+
 
| Hamsi-256  || [http://homes.esat.kuleuven.be/~okucuk/hamsi/implementations.html Junfeng Fan (Hamsi website)] [[#Ref016|[16]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || 0.13 µm  || align="right"| 22 kGates  || align="right"| 4940 Mbit/s  || align="right"| 1080 MHz
+
|- style="background:#ffdf9f;"
|-
+
| JH-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 320 S-boxes, one round of R<sub>8</sub> per cycle  || UMC 0.18 µm  || align="right"| 58.83 kGates  || align="right"| 4219 Mbit/s  || align="right"| 380.22 MHz
| Hamsi-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Three instances of P/Pf function unrolled  || UMC 0.18 µm  || align="right"| 58.66 kGates  || align="right"| 5565 Mbit/s  || align="right"| 173.91 MHz
+
|- style="background:#ffdf9f;"
|-
+
| JH-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || S-boxes as LUTs, stored constants  || UMC 90 nm  || align="right"| 80 kGates  || align="right"| 9134 Mbit/s  || align="right"| 760 MHz
| Hamsi-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Message expansions in LUTs, one round per cycle  || UMC 90 nm  || align="right"| 45 kGates  || align="right"| 8686 Mbit/s  || align="right"| 814 MHz
+
|- style="background:#ffdf9f;"
|-
+
| JH-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage]   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 62.42 kGates  || align="right"| 4334 Mbit/s  || align="right"| 391 MHz
| Hamsi-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 29.94 kGates  || align="right"| 3571 Mbit/s  || align="right"| 446 MHz
+
|- style="background:#ffdf9f;"
|-
+
| JH-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 54.6 kGates  || align="right"| 8471 Mbit/s  || align="right"| 763.4 MHz
| Hamsi-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 67.6 kGates  || align="right"| 7767 Mbit/s  || align="right"| 970.9 MHz
+
 
|-
 
| Hamsi-512  || [http://homes.esat.kuleuven.be/~okucuk/hamsi/implementations.html Junfeng Fan (Hamsi website)] [[#Ref016|[16]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || 0.13 µm  || align="right"| 50 kGates  || align="right"| 3970 Mbit/s  || align="right"| 820 MHz
 
|-
 
| JH-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 320 S-boxes, one round of R<sub>8</sub> per cycle  || UMC 0.18 µm  || align="right"| 58.83 kGates  || align="right"| 4991 Mbit/s  || align="right"| 380.22 MHz
 
|-
 
| JH-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || S-boxes as LUTs, stored constants  || UMC 90 nm  || align="right"| 80 kGates  || align="right"| 10807 Mbit/s  || align="right"| 760 MHz
 
|-
 
| JH-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 62.42 kGates  || align="right"| 5128 Mbit/s  || align="right"| 391 MHz
 
|-
 
| JH-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 54.6 kGates  || align="right"| 10022 Mbit/s  || align="right"| 763.4 MHz
 
 
|-
 
|-
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Core (round function, state register) & IO buffer  || ST 0.13 µm  || align="right"| 48 kGates  || align="right"| 29900 Mbit/s  || align="right"| 526 MHz
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Core (round function, state register) & IO buffer  || ST 0.13 µm  || align="right"| 48 kGates  || align="right"| 29900 Mbit/s  || align="right"| 526 MHz
Line 496: Line 411:
 
| Keccak(-256)  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One round per cycle  || UMC 90 nm  || align="right"| 50 kGates  || align="right"| 43011 Mbit/s  || align="right"| 949 MHz
 
| Keccak(-256)  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One round per cycle  || UMC 90 nm  || align="right"| 50 kGates  || align="right"| 43011 Mbit/s  || align="right"| 949 MHz
 
|-
 
|-
| Keccak(-256)  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 47.43 kGates  || align="right"| 15457 Mbit/s  || align="right"| 377 MHz
+
| Keccak(-256)  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage]   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 47.43 kGates  || align="right"| 15457 Mbit/s  || align="right"| 377 MHz
 
|-
 
|-
 
| Keccak  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || One Keccak-f round per cycle  || Synopsys 90 nm  || align="right"| 10.5 kGates  || align="right"| 19320 Mbit/s  || align="right"| 454.5 MHz
 
| Keccak  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || One Keccak-f round per cycle  || Synopsys 90 nm  || align="right"| 10.5 kGates  || align="right"| 19320 Mbit/s  || align="right"| 454.5 MHz
 
|-
 
|-
 
| Keccak(-256)  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 50.7 kGates  || align="right"| 33333 Mbit/s  || align="right"| 781.3 MHz
 
| Keccak(-256)  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 50.7 kGates  || align="right"| 33333 Mbit/s  || align="right"| 781.3 MHz
|-
 
| Luffa-224/256 || [http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf Knežević and Verbauwhede] [[#Ref017|[17]]] / [http://homes.esat.kuleuven.be/~mknezevi/luffa/luffa_hw_code.tar.gz Author's webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) || UMC 0.13 µm || align="right"| 30.83 kGates  || align="right"| 31960 Mbit/s  || align="right"| 1124 MHz
 
|-
 
| Luffa-256  || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function (1 cycle latency) and I/O registers  || STM 90 nm  || align="right"| 122 kGates  || align="right"| 25702 Mbit/s(*)  || align="right"| 100.4 MHz
 
|-
 
| Luffa-224/256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each)  || UMC 0.18 µm  || align="right"| 44.97 kGates  || align="right"| 13741 Mbit/s  || align="right"| 483.09 MHz
 
|-
 
| Luffa-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Three parallel step modules, SubCrumb as logic  || UMC 90 nm  || align="right"| 55 kGates  || align="right"| 23256 Mbit/s  || align="right"| 727 MHz
 
|-
 
| Luffa-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 37.94 kGates  || align="right"| 13943 Mbit/s  || align="right"| 490 MHz
 
|-
 
| Luffa-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 39.6 kGates  || align="right"| 28732 Mbit/s  || align="right"| 1010.1 MHz
 
  
 
|-
 
|-
| Luffa-256  || [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5513102&tag=1 Satoh et al.] [[#Ref038|[38]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 62.8 kGates  || align="right"| 35068.5 Mbit/s  || align="right"| 684.9 MHz
+
| Keccak(-256) || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] [[#Ref039|[39]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 55.9 kGates  || align="right"| 43986 Mbit/s  || align="right"| 1030.9 MHz
  
|-
 
| Luffa-384 || [http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf Knežević and Verbauwhede] [[#Ref017|[17]]] / [http://homes.esat.kuleuven.be/~mknezevi/luffa/luffa_hw_code.tar.gz Author's webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) || UMC 0.13 µm || align="right"| 50.07 kGates  || align="right"| 23126 Mbit/s  || align="right"| 813 MHz
 
|-
 
| Luffa-512 || [http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf Knežević and Verbauwhede] [[#Ref017|[17]]] / [http://homes.esat.kuleuven.be/~mknezevi/luffa/luffa_hw_code.tar.gz Author's webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Five permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each) || UMC 0.13 µm || align="right"| 65.1 kGates  || align="right"| 19617 Mbit/s  || align="right"| 690 MHz
 
|-
 
| Luffa  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Three step modules  || Synopsys 90 nm  || align="right"| 11.5 kGates  || align="right"| 21370 Mbit/s  || align="right"| 769.2 MHz
 
|-
 
| Shabal-256  || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with I/O registers (latency of 16 clock cycles)  || STM 90 nm  || align="right"| 20 kGates  || align="right"| 4408 Mbit/s(*)  || align="right"| 413.22 MHz
 
|-
 
| Shabal-256  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One word rotation per cycle, 50 cycles per block  || UMC 0.18 µm  || align="right"| 54.19 kGates  || align="right"| 3282 Mbit/s  || align="right"| 320.51 MHz
 
|-
 
| Shabal  || [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043 Bernet et al.] [[#Ref020|[20]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One word rotation per cycle, 52 cycles per block  || 0.13 µm  || align="right"| 41.32 kGates  || align="right"| 6351 Mbit/s(***)  || align="right"| 645 MHz
 
|-
 
| Shabal-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 30 adders, 16 subtractors  || UMC 90 nm  || align="right"| 45 kGates  || align="right"| 6819 Mbit/s  || align="right"| 693 MHz
 
|-
 
| Shabal-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 49.44 kGates  || align="right"| 2945 Mbit/s  || align="right"| 362 MHz
 
|-
 
| Shabal-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 34.6 kGates  || align="right"| 6059 Mbit/s  || align="right"| 591.7 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four AES rounds (two for compression, two for message expansion)  || UMC 0.18 µm  || align="right"| 57.39 kGates  || align="right"| 3152 Mbit/s  || align="right"| 227.79 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One AES round each for message expansion and F<sup>3</sup> round  || UMC 90 nm  || align="right"| 75 kGates  || align="right"| 7999 Mbit/s  || align="right"| 562 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 55.25 kGates  || align="right"| 4599 Mbit/s  || align="right"| 341 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 59.4 kGates  || align="right"| 8421 Mbit/s  || align="right"| 625 MHz
 
|-
 
| SIMD-256(**)  || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Two FFT-64 with two FFT-8 and 16 multipliers (8x8 bit) each  || UMC 0.18 µm  || align="right"| 104.17 kGates  || align="right"| 924 Mbit/s  || align="right"| 64.93 MHz
 
|-
 
| SIMD-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four parallel Feistel modules, message expansion based on NNT<sub>8</sub> and eight multipliers  || UMC 90 nm  || align="right"| 135 kGates  || align="right"| 5177 Mbit/s  || align="right"| 364 MHz
 
|-
 
| SIMD-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 139.55 kGates  || align="right"| 2157 Mbit/s  || align="right"| 194 MHz
 
|-
 
| SIMD-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 139 kGates  || align="right"| 3171 Mbit/s  || align="right"| 284.9 MHz
 
 
|-
 
|-
 
| Skein-256-256 || [http://eprint.iacr.org/2009/159.pdf Stefan Tillich] [[#Ref012|[12]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || UMC 0.18 µm || align="right"| 53.87 kGates  || align="right"| 1762 Mbit/s || align="right"| 68.8 MHz
 
| Skein-256-256 || [http://eprint.iacr.org/2009/159.pdf Stefan Tillich] [[#Ref012|[12]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || UMC 0.18 µm || align="right"| 53.87 kGates  || align="right"| 1762 Mbit/s || align="right"| 68.8 MHz
Line 560: Line 429:
 
| Skein-256-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four unrolled Threefish rounds  || UMC 90 nm  || align="right"| 50 kGates  || align="right"| 3558 Mbit/s  || align="right"| 264 MHz
 
| Skein-256-256  || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Four unrolled Threefish rounds  || UMC 90 nm  || align="right"| 50 kGates  || align="right"| 3558 Mbit/s  || align="right"| 264 MHz
 
|-
 
|-
| Skein-256-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / N/A   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 40.9 kGates  || align="right"| 1941 Mbit/s  || align="right"| 159 MHz
+
| Skein-256-256  || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage]   || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || UMC 0.13 µm  || align="right"| 40.9 kGates  || align="right"| 1941 Mbit/s  || align="right"| 159 MHz
 
|-
 
|-
 
| Skein-256-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 43.1 kGates  || align="right"| 3295 Mbit/s  || align="right"| 270.3 MHz
 
| Skein-256-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 43.1 kGates  || align="right"| 3295 Mbit/s  || align="right"| 270.3 MHz
Line 570: Line 439:
  
 
(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.
 
(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.
<br />
 
(**) Implementation of round-one variant.
 
<br />
 
(***) Estimated peak throughput: Throughput for CubeHash8/1-h implementation * 16.
 
  
 
<br><br>
 
<br><br>
Line 582: Line 447:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="120"| Hash Function Name  !! width="150"| Reference / HDL  !! width="100"| Impl. Scope  !! width="200"| Implementation Details  !! width="100"| Technology  !! width="80"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="120"| Hash Function Name  !! width="150"| Reference / HDL  !! width="100"| Impl. Scope  !! width="200"| Implementation Details  !! width="100"| Technology  !! width="80"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
 
| BLAKE-32 || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One G function in 11 cycles  || AMS 0.35 µm  || align="right"|  25.57 kGates  || align="right"|  15.4 Mbit/s  || align="right"| 31.25 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One G function in 11 cycles  || AMS 0.35 µm  || align="right"|  25.57 kGates  || align="right"|  11 Mbit/s  || align="right"| 31.25 MHz
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with a single G function unit || UMC 0.18 µm  || align="right"|  10.54 kGates  || align="right"|  253 Mbit/s  || align="right"| 40 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with a single G function unit || UMC 0.18 µm  || align="right"|  10.54 kGates  || align="right"|  180.7 Mbit/s  || align="right"| 40 MHz
| BLAKE-32 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with a half G function unit || UMC 0.18 µm  || align="right"| 9.89 kGates  || align="right"|  127 Mbit/s  || align="right"|  40 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with a half G function unit || UMC 0.18 µm  || align="right"| 9.89 kGates  || align="right"|  90.7 Mbit/s  || align="right"|  40 MHz
| BLAKE-64 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with a single G function unit || UMC 0.18 µm   || align="right"| 20.61 kGates  || align="right"| 181 Mbit/s  || align="right"| 20 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 1 adder and 4-word latch array  || UMC 0.18 µm || align="right"| 13.56 kGates  || align="right"| 96.4 Mbit/s  || align="right"| 215 MHz
| BLAKE-64 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with a half G function unit || UMC 0.18 µm   || align="right"| 19.46 kGates  || align="right"| 91 Mbit/s  || align="right"| 20 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || 1 adder and 4-word latch array  || UMC 0.18 µm || align="right"| 8.60 kGates  || align="right"| 44.3 Mbit/s  || align="right"| 100 MHz
| CubeHash16/32-h || [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043 Bernet et al.] [[#Ref020|[20]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Process two 32-bit words per cycle, 64 cycles per round  || 0.13 µm || align="right"| 7.63 kGates  || align="right"| 32 Mbit/s(****) || align="right"| 100 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with a single G function unit || UMC 0.18 µm   || align="right"| 20.61 kGates  || align="right"| 158.4 Mbit/s  || align="right"| 20 MHz
| ECHO-224/256 || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] [[#Ref005|[5]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  ||  0.13 µm || align="right"| 82.8 kGates  || align="right"| 373 Mbit/s  || align="right"| 66.6 MHz
+
|- style="background:#ffdf9f;"
|-
+
| BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with a half G function unit || UMC 0.18 µm  || align="right"| 19.46 kGates  || align="right"| 79.6 Mbit/s  || align="right"| 20 MHz
| Fugue-256 || [http://domino.research.ibm.com/comm/research_projects.nsf/pages/fugue.index.html/$FILE/NIST-submission-Oct08-fugue.pdf Submission doc.] [[#Ref015|[15]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One SMIX transformation (SUPER1_L) || IBM 90 nm || align="right"| 59.22 kGates  || align="right"| 2000 Mbit/s  || align="right"| 500 MHz
+
 
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath, P & Q permutation shared || AMS 0.35 µm  || align="right"| 14.62 kGates  || align="right"| 145.9 Mbit/s  || align="right"| 55.87 MHz
 
| Grøstl-224/256  || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath, P & Q permutation shared || AMS 0.35 µm  || align="right"| 14.62 kGates  || align="right"| 145.9 Mbit/s  || align="right"| 55.87 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-224/256  || [http://www.groestl.info Grøstl website] [[#Ref019|[19]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath, P & Q permutation shared || UMC 0.18 µm  || align="right"| 17 kGates  || align="right"| 645 Mbit/s  || align="right"| 246.9 MHz
 
| Grøstl-224/256  || [http://www.groestl.info Grøstl website] [[#Ref019|[19]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath, P & Q permutation shared || UMC 0.18 µm  || align="right"| 17 kGates  || align="right"| 645 Mbit/s  || align="right"| 246.9 MHz
 +
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-256  || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] [[#Ref039|[39]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||  || STM 90 nm  || align="right"| 34.8 kGates  || align="right"| 2478 Mbit/s  || align="right"| 101.6 MHz
 +
 
|-
 
|-
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || Small core using system memory || ST 0.13 µm  || align="right"| 6.5 kGates  || align="right"| 176.4 Mbit/s(*)  || align="right"| 666.7 MHz
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || Small core using system memory || ST 0.13 µm  || align="right"| 6.5 kGates  || align="right"| 176.4 Mbit/s(*)  || align="right"| 666.7 MHz
 
|-
 
|-
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || Small core using system memory, clock freq. limited to 200 MHz || ST 0.13 µm  || align="right"| 5 kGates  || align="right"| 52.9 Mbit/s(**)  || align="right"| 200 MHz
 
| Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Implementation_with_External_Memory|Using external memory]]  || Small core using system memory, clock freq. limited to 200 MHz || ST 0.13 µm  || align="right"| 5 kGates  || align="right"| 52.9 Mbit/s(**)  || align="right"| 200 MHz
|-
 
| Luffa-224/256 || [http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf Knežević and Verbauwhede] [[#Ref017|[17]]] / [http://homes.esat.kuleuven.be/~mknezevi/luffa/luffa_hw_code.tar.gz Author's webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One permutation block (64 S-boxes, 4 MixWord blocks) || UMC 0.13 µm || align="right"| 18.26 kGates  || align="right"| 2461 Mbit/s  || align="right"| 250 MHz
 
|-
 
| Luffa-256 || [http://www.sdl.hitachi.co.jp/crypto/luffa/ACompactHardwareImplementationOfSHA-3CandidateLuffa_20100810.pdf Mikami et al.] [[#Ref027|[27]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One permutation block (64 S-boxes, 4 MixWord blocks) || UMC 0.13 µm || align="right"| 10.34 kGates  || align="right"| 538 Mbit/s  || align="right"| 806 MHz
 
 
|-
 
| Luffa-256  || [http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5513102&tag=1 Satoh et al.] [[#Ref038|[38]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One permutation block (64 S-boxes, 4 MixWord blocks)  || STM 90 nm  || align="right"| 14.7 kGates  || align="right"| 3641.1 Mbit/s  || align="right"| 355.9 MHz
 
  
|-
 
| Luffa-384 || [http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf Knežević and Verbauwhede] [[#Ref017|[17]]] / [http://homes.esat.kuleuven.be/~mknezevi/luffa/luffa_hw_code.tar.gz Author's webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 6 S-boxes, 1 MixWord || TSMC 90 nm || align="right"| 27.13 kGates  || align="right"| 1882 Mbit/s  || align="right"| 250 MHz
 
|-
 
| Luffa-512 || [http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf Knežević and Verbauwhede] [[#Ref017|[17]]] / [http://homes.esat.kuleuven.be/~mknezevi/luffa/luffa_hw_code.tar.gz Author's webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One permutation block (64 S-boxes, 4 MixWord blocks) || UMC 0.13 µm || align="right"| 37.35 kGates  || align="right"| 1524 Mbit/s  || align="right"| 250 MHz
 
|-
 
| Shabal  || [http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043 Bernet et al.] [[#Ref020|[20]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || One adder, one subtractor, one incrementer. 165 cycles per block  || 0.13 µm  || align="right"| 23.32 kGates  || align="right"| 310 Mbit/s  || align="right"| 100 MHz
 
 
|-
 
|-
 
| Skein-256-256  || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath  || AMS 0.35 µm  || align="right"| 12.89 kGates  || align="right"| 19.8 Mbit/s  || align="right"| 80 MHz
 
| Skein-256-256  || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 64-bit datapath  || AMS 0.35 µm  || align="right"| 12.89 kGates  || align="right"| 19.8 Mbit/s  || align="right"| 80 MHz
Line 631: Line 487:
 
<br />
 
<br />
 
(***) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s
 
(***) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s
<br />
 
(****) Estimated peak throughput: Throughput for CubeHash8/1-h implementation * 16.
 
  
 
<br />
 
<br />
Line 639: Line 493:
 
== Comparative Studies ==
 
== Comparative Studies ==
  
This section summarizes the reported results of publications which examined more than one round-two candidate in a similar setup.
+
This section summarizes the reported results of publications which examined more than one round-three candidate in a similar setup.
  
=== Blake, BMW, Luffa, Shabal, Skein ===
+
=== BLAKE, Skein ===
  
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
Line 654: Line 508:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32 || Compression function with 8 G function units and I/O registers  || align="right"| 5435 ALUTs  || align="right"| 2186.2 Mbit/s  || align="right"| 46.97 MHz
+
| BLAKE-256 || Compression function with 8 G function units and I/O registers  || align="right"| 5435 ALUTs  || align="right"| 1562 Mbit/s  || align="right"| 46.97 MHz
|-
 
| Blue Midnight Wish-256  || Compression function with f0, f1, and f2 unrolled in sequence and I/O registers  || align="right"| 12917 ALUTs  || align="right"| 4889.6 Mbit/s  || align="right"| 9.55 MHz
 
|-
 
| Luffa-256  || Compression function (1 cycle latency) and I/O registers  || align="right"| 16552 ALUTs  || align="right"| 12042.2 Mbit/s  || align="right"| 47.04 MHz
 
|-
 
| Shabal-256  || Compression function with I/O registers (latency of 16 clock cycles)  || align="right"| 1440 ALUTs  || align="right"| 3125.6 Mbit/s  || align="right"| 195.35 MHz
 
 
|-
 
|-
 
| Skein-256-256  || All 72 Threefish rounds unrolled (device too small) || align="right"| N/A  || align="right"| N/A  || align="right"| N/A
 
| Skein-256-256  || All 72 Threefish rounds unrolled (device too small) || align="right"| N/A  || align="right"| N/A  || align="right"| N/A
Line 681: Line 529:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32 || Compression function with 8 G function units and I/O registers  || align="right"| 53 kGates  || align="right"| 4475 Mbit/s(*)  || align="right"| 96.15 MHz
+
| BLAKE-256 || Compression function with 8 G function units and I/O registers  || align="right"| 53 kGates  || align="right"| 3196 Mbit/s(*)  || align="right"| 96.15 MHz
|-
 
| Blue Midnight Wish-256  || Compression function with f0, f1, and f2 unrolled in sequence and I/O registers  || align="right"| 164 kGates  || align="right"| 26665 Mbit/s(*)  || align="right"| 52.08 MHz
 
|-
 
| Luffa-256  || Compression function (1 cycle latency) and I/O registers  || align="right"| 122 kGates  || align="right"| 25702 Mbit/s(*)  || align="right"| 100.4 MHz
 
|-
 
| Shabal-256  || Compression function with I/O registers (latency of 16 clock cycles)  || align="right"| 20 kGates  || align="right"| 4408 Mbit/s(*)  || align="right"| 413.22 MHz
 
 
|-
 
|-
 
| Skein-256-256  || All 72 Threefish rounds unrolled  || align="right"| 369 kGates  || align="right"| 3126 Mbit/s(*)  || align="right"| 12.21 MHz
 
| Skein-256-256  || All 72 Threefish rounds unrolled  || align="right"| 369 kGates  || align="right"| 3126 Mbit/s(*)  || align="right"| 12.21 MHz
Line 698: Line 540:
 
<br />
 
<br />
  
=== Blake, CubeHash, ECHO, Grøstl, Hamsi, Luffa, Shabal, Skein ===
+
=== BLAKE, Grøstl, Skein ===
  
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
Line 711: Line 553:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32  ||    || align="right"| 1660 slices  || align="right"| 2676 Mbit/s  || align="right"| 115 MHz
+
| BLAKE-256  ||    || align="right"| 1660 slices  || align="right"| 1911 Mbit/s  || align="right"| 115 MHz
|-
+
|- style="background:#ffdf9f;"
| CubeHash16/32-256  ||    || align="right"| 590 slices  || align="right"| 2960 Mbit/s  || align="right"| 185 MHz
 
|-
 
| ECHO-256  ||    || align="right"| 3556 slices  || align="right"| 1614 Mbit/s  || align="right"| 104 MHz
 
|-
 
 
| Grøstl-256  ||    || align="right"| 4057 slices  || align="right"| 5171 Mbit/s  || align="right"| 101 MHz
 
| Grøstl-256  ||    || align="right"| 4057 slices  || align="right"| 5171 Mbit/s  || align="right"| 101 MHz
|-
 
| Hamsi-256  ||    || align="right"| 718 slices  || align="right"| 1680 Mbit/s  || align="right"| 210 MHz
 
|-
 
| Luffa-256  ||    || align="right"| 1048 slices  || align="right"| 6343 Mbit/s  || align="right"| 223 MHz
 
|-
 
| Shabal-256  ||    || align="right"| 1251 slices  || align="right"| 1739 Mbit/s  || align="right"| 214 MHz
 
 
|-
 
|-
 
| Skein-256  ||    || align="right"| 854 slices  || align="right"| 1482 Mbit/s  || align="right"| 115 MHz
 
| Skein-256  ||    || align="right"| 854 slices  || align="right"| 1482 Mbit/s  || align="right"| 115 MHz
Line 732: Line 564:
 
<br />
 
<br />
  
=== CubeHash, Grøstl, Shabal ===
+
=== All 5 Round-Three Candidates ===
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-
 
| [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]]  || N/A  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Xilinx Spartan 3
 
|}
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
|-
 
| CubeHash8/1-256(*)  || 2 compression functions unrolled  || align="right"| 3268 slices  || align="right"| 70 Mbit/s  || align="right"| 37.9 MHz
 
|-
 
| Grøstl-224/256  || P & Q permutation in parallel, S-box in BRAM  || align="right"| 4827 slices  || align="right"| 3660 Mbit/s  || align="right"| 71.53 MHz
 
|-
 
| Grøstl-384/512  || P & Q permutation parallel, S-box in LUTs  || align="right"| 17452 slices  || align="right"| 3180 Mbit/s  || align="right"| 79.61 MHz
 
|-
 
| Shabal  || 36 adders in permutation  || align="right"| 2223 slices  || align="right"| 740 Mbit/s  || align="right"| 71.48 MHz
 
|}
 
 
 
(*) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.
 
 
 
 
 
----
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-
 
| [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]]  || N/A  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Xilinx Virtex 5
 
|}
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
|-
 
| CubeHash8/1-256(*)  || 1 iterated compression function  || align="right"| 1178 slices  || align="right"| 160 Mbit/s  || align="right"| 166.8 MHz
 
|-
 
| Grøstl-224/256  || P & Q permutation in parallel, S-box in BRAM  || align="right"| 4516 slices  || align="right"| 7310 Mbit/s  || align="right"| 142.87 MHz
 
|-
 
| Grøstl-384/512  || P & Q permutation parallel, S-box in LUTs  || align="right"| 19161 slices  || align="right"| 6090 Mbit/s  || align="right"| 83.33 MHz
 
|-
 
| Shabal  || 36 adders in permutation  || align="right"| 2768 slices  || align="right"| 1450 Mbit/s  || align="right"| 138.87 MHz
 
|}
 
 
 
(*) CubeHash16/32-h implemented in a similar fashion can be expected to have throughput increased by a factor of about 16.
 
 
 
<br />
 
<br />
 
 
 
=== All 14 Round-Two Candidates ===
 
  
 
Reported results are post-synthesis. An interactive graphical comparison of various area-performance tradeoffs of this study can be found [http://www.iaik.tugraz.at/content/research/vlsi/sha3hw/ here].
 
Reported results are post-synthesis. An interactive graphical comparison of various area-performance tradeoffs of this study can be found [http://www.iaik.tugraz.at/content/research/vlsi/sha3hw/ here].
Line 803: Line 580:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32 || Compression function with 4 G function units with CSAs  || align="right"| 45.64 kGates  || align="right"| 3971 Mbit/s  || align="right"| 170.64 MHz
+
| BLAKE-256 || Compression function with 4 G function units with CSAs  || align="right"| 45.64 kGates  || align="right"| 2836 Mbit/s  || align="right"| 170.64 MHz
|-
+
|- style="background:#ffdf9f;"
| Blue Midnight Wish-256  || Compression function with f0, f1, and f2 unrolled  || align="right"| 169.74 kGates  || align="right"| 5358 Mbit/s  || align="right"| 10.46 MHz
 
|-
 
| CubeHash16/32-h  || Dynamically reconfigurable r and b parameters, two rounds unrolled  || align="right"| 58.87 kGates  || align="right"| 4665 Mbit/s  || align="right"| 145.77 MHz
 
|-
 
| ECHO-256  || Four parallel AES rounds, 16 AES MixColumns 32-bit column multipliers  || align="right"| 141.49 kGates  || align="right"| 2246 Mbit/s  || align="right"| 141.84 MHz
 
|-
 
| Fugue-256  || Four columns of SMIX transformation in parallel  || align="right"| 46.26 kGates  || align="right"| 4092 Mbit/s  || align="right"| 255.75 MHz
 
|-
 
 
| Grøstl-256  || One shared permutation for P & Q, one pipeline stage  || align="right"| 58.40 kGates  || align="right"| 6290 Mbit/s  || align="right"| 270.27 MHz
 
| Grøstl-256  || One shared permutation for P & Q, one pipeline stage  || align="right"| 58.40 kGates  || align="right"| 6290 Mbit/s  || align="right"| 270.27 MHz
|-
+
|- style="background:#ffdf9f;"
| Hamsi-256  || Three instances of P/Pf function unrolled  || align="right"| 58.66 kGates  || align="right"| 5565 Mbit/s  || align="right"| 173.91 MHz
+
| JH-256  || 320 S-boxes, one round of R<sub>8</sub> per cycle  || align="right"| 58.83 kGates  || align="right"| 4219 Mbit/s  || align="right"| 380.22 MHz
|-
 
| JH-256  || 320 S-boxes, one round of R<sub>8</sub> per cycle  || align="right"| 58.83 kGates  || align="right"| 4991 Mbit/s  || align="right"| 380.22 MHz
 
 
|-
 
|-
 
| Keccak(-256)  || One instance of Keccak-f round  || align="right"| 56.32 kGates  || align="right"| 21229 Mbit/s  || align="right"| 487.80 MHz
 
| Keccak(-256)  || One instance of Keccak-f round  || align="right"| 56.32 kGates  || align="right"| 21229 Mbit/s  || align="right"| 487.80 MHz
|-
 
| Luffa-224/256  || Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each)  || align="right"| 44.97 kGates  || align="right"| 13741 Mbit/s  || align="right"| 483.09 MHz
 
|-
 
| Shabal-256  || One word rotation per cycle, 50 cycles per block  || align="right"| 54.19 kGates  || align="right"| 3282 Mbit/s  || align="right"| 320.51 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || Four AES rounds (two for compression, two for message expansion)  || align="right"| 57.39 kGates  || align="right"| 3152 Mbit/s  || align="right"| 227.79 MHz
 
|-
 
| SIMD-256(*)  || Two FFT-64 with two FFT-8 and 16 multipliers (8x8 bit) each  || align="right"| 104.17 kGates  || align="right"| 924 Mbit/s  || align="right"| 64.93 MHz
 
 
|-
 
|-
 
| Skein-256-256  || 8 Threefish rounds unrolled  || align="right"| 58.61 kGates  || align="right"| 1882 Mbit/s  || align="right"| 73.52 MHz
 
| Skein-256-256  || 8 Threefish rounds unrolled  || align="right"| 58.61 kGates  || align="right"| 1882 Mbit/s  || align="right"| 73.52 MHz
Line 834: Line 593:
 
| Skein-512-512  || 8 Threefish rounds unrolled  || align="right"| 102.04 kGates  || align="right"| 2502 Mbit/s  || align="right"| 48.87 MHz
 
| Skein-512-512  || 8 Threefish rounds unrolled  || align="right"| 102.04 kGates  || align="right"| 2502 Mbit/s  || align="right"| 48.87 MHz
 
|}
 
|}
 
(*) Implementation of round-one variant.
 
  
 
<br />
 
<br />
Line 853: Line 610:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32 || One G function in 11 cycles  || align="right"|  25.57 kGates  || align="right"|  15.4 Mbit/s  || align="right"| 31.25 MHz
+
| BLAKE-256 || One G function in 11 cycles  || align="right"|  25.57 kGates  || align="right"|  11 Mbit/s  || align="right"| 31.25 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-224/256  || 64-bit datapath, P & Q permutation shared  || align="right"| 14.62 kGates  || align="right"| 145.9 Mbit/s  || align="right"| 55.87 MHz
 
| Grøstl-224/256  || 64-bit datapath, P & Q permutation shared  || align="right"| 14.62 kGates  || align="right"| 145.9 Mbit/s  || align="right"| 55.87 MHz
 
|-
 
|-
Line 861: Line 618:
 
|}
 
|}
  
 +
<br />
 +
<br />
 +
 +
=== All 5 Round-Three Candidates ===
 +
 +
Reported results of this study are post-P&amp;R performances of designs targeting high throughput.
  
=== ECHO, Hamsi, Luffa ===
 
  
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
Line 868: Line 630:
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-  
 
|-  
| [http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf Ramakers and Narinx] [[#Ref025|[25]]]  || [http://ehash.iaik.tugraz.at/uploads/2/27/Ramakers_Narinx2010ECHO-Hamsi-Luffa_VHDL_sources.zip Hosted by SHA-3 zoo]  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Xilinx Virtex 5
+
| [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]]  || [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || UMC 90 nm
 
|}
 
|}
  
Line 875: Line 637:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || Four parallel G functions modules  || align="right"| 47.5 kGates  || align="right"| 6966 Mbit/s  || align="right"| 400 MHz
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-256  || P and Q permutation interleaved with one pipeline stage, S-box as LUT  || align="right"| 135 kGates  || align="right"| 16254 Mbit/s  || align="right"| 667 MHz
 +
|- style="background:#ffdf9f;"
 +
| JH-256  || S-boxes as LUTs, stored constants  || align="right"| 80 kGates  || align="right"| 9134 Mbit/s  || align="right"| 760 MHz
 
|-
 
|-
| ECHO-256  || Straight-forward instantiation of complete compression function || align="right"| 15006 slices || align="right"| 23860 Mbit/s  || align="right"| 139 MHz
+
| Keccak(-256) || One round per cycle || align="right"| 50 kGates || align="right"| 43011 Mbit/s  || align="right"| 949 MHz
 
|-
 
|-
| ECHO-256 || Optimized: 4 x 2 AES round instances with pipeline register in BigSubWords  || align="right"| 12061 slices  || align="right"| 3560 Mbit/s  || align="right"| 187 MHz
+
| Skein-256-256  || Four unrolled Threefish rounds  || align="right"| 50 kGates || align="right"| 3558 Mbit/s  || align="right"| 264 MHz
|-
 
| Hamsi-256  || Straight-forward instantiation of complete compression function  || align="right"| 4664 slices  || align="right"| 6620 Mbit/s  || align="right"| 207 MHz
 
|-
 
| Hamsi-256  || Non-linear permutation block reused  || align="right"| 2113 slices  || align="right"| 1970 Mbit/s  || align="right"| 308 MHz
 
|-
 
| Luffa-256  || Straight-forward instantiation of complete compression function  || align="right"| 9611 slices  || align="right"| 12290 Mbit/s  || align="right"| 48.2 MHz
 
|-
 
| Luffa-256  || One step block reused for 8 rounds  || align="right"| 2303 slices || align="right"| 5090 Mbit/s  || align="right"| 179 MHz
 
 
|}
 
|}
  
 +
<br />
 +
<br />
  
=== All 14 Round-Two Candidates ===
+
=== All 5 Round-Three Candidates ===
  
Reported results of this study are post-P&amp;R performances of designs targeting high throughput.
+
Designs optimized towards throughput to area ratio. The cited results are those for the Xilinx Virtex 5 and Altera Stratix III platforms (both for the 256-bit and the 512-bit version of the candidates). Results marked with N/A did not fit into the largest device of the device family. For a full listing of all ATHENa results refer to the [http://cryptography.gmu.edu/athena/ ATHENa webpage].
  
  
Line 899: Line 661:
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-  
 
|-  
| [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]]  || [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage]  || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || UMC 90 nm
+
| [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]]  || [mailto:kgaj@gmu.edu On request]  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Xilinx Virtex 5
 
|}
 
|}
  
Line 906: Line 668:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 +
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || 4 G function units per iteration  || align="right"| 1523 slices  || align="right"| 2245 Mbit/s  || align="right"| 128.9 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512  || 4 G function units per iteration  || align="right"| 3064 slices  || align="right"| 3080 Mbit/s  || align="right"| 99.7 MHz
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-256  || P & Q permutations interleaved  || align="right"| 1597 slices  || align="right"| 7885 Mbit/s  || align="right"| 323.4 MHz
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-512  || P & Q permutations interleaved  || align="right"| 3138 slices  || align="right"| 10314 Mbit/s  || align="right"| 292.1 MHz
 +
|- style="background:#ffdf9f;"
 +
| JH-256  ||  || align="right"| 1018 slices  || align="right"| 4578 Mbit/s  || align="right"| 380.8 MHz
 +
|- style="background:#ffdf9f;"
 +
| JH-512  ||  || align="right"| 1104 slices  || align="right"| 4742 Mbit/s  || align="right"| 394.5 MHz
 
|-
 
|-
| BLAKE-32 || Four parallel G functions modules || align="right"| 47.5 kGates || align="right"| 9752 Mbit/s  || align="right"| 400 MHz
+
| Keccak(-256) ||  || align="right"| 1272 slices || align="right"| 12817 Mbit/s  || align="right"| 282.7 MHz
 
|-
 
|-
| Blue Midnight Wish-256 || single-cycle f0 and f2, f1 iteratively || align="right"| 150 kGates || align="right"| 8486 Mbit/s  || align="right"| 298 MHz
+
| Keccak(-512) ||  || align="right"| 1257 slices || align="right"| 6845 Mbit/s  || align="right"| 285.2 MHz
 
|-
 
|-
| CubeHash16/32-256  || One round per cycle, IV fixed || align="right"| 42.5 kGates || align="right"| 10667 Mbit/s  || align="right"| 667 MHz
+
| Skein-512-256  || 4 Threefish rounds unrolled || align="right"| 1621 slices || align="right"| 3178 Mbit/s  || align="right"| 118.0 MHz
 
|-
 
|-
| ECHO-256  || 8 AES rounds per cycle  || align="right"| 260 kGates  || align="right"| 13966 Mbit/s  || align="right"| 291 MHz
+
| Skein-512-512 || 4 Threefish rounds unrolled || align="right"| 1716 slices || align="right"| 3209 Mbit/s  || align="right"| 119.1 MHz
|-
+
 
| Fugue-256  || S-box as LUT  || align="right"| 55 kGates  || align="right"| 8815 Mbit/s  || align="right"| 551 MHz
 
|-
 
| Grøstl-256  || P and Q permutation interleaved with one pipeline stage, S-box as LUT  || align="right"| 135 kGates  || align="right"| 16254 Mbit/s  || align="right"| 667 MHz
 
|-
 
| Hamsi-256  || Message expansions in LUTs, one round per cycle  || align="right"| 45 kGates  || align="right"| 8686 Mbit/s  || align="right"| 814 MHz
 
|-
 
| JH-256  || S-boxes as LUTs, stored constants  || align="right"| 80 kGates  || align="right"| 10807 Mbit/s  || align="right"| 760 MHz
 
|-
 
| Keccak(-256)  || One round per cycle  || align="right"| 50 kGates  || align="right"| 43011 Mbit/s  || align="right"| 949 MHz
 
|-
 
| Luffa-256  || Three parallel step modules, SubCrumb as logic  || align="right"| 55 kGates  || align="right"| 23256 Mbit/s  || align="right"| 727 MHz
 
|-
 
| Shabal-256  || 30 adders, 16 subtractors  || align="right"| 45 kGates  || align="right"| 6819 Mbit/s  || align="right"| 693 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  || One AES round each for message expansion and F<sup>3</sup> round  || align="right"| 75 kGates  || align="right"| 7999 Mbit/s  || align="right"| 562 MHz
 
|-
 
| SIMD-256  || Four parallel Feistel modules, message expansion based on NNT<sub>8</sub> and eight multipliers  || align="right"| 135 kGates  || align="right"| 5177 Mbit/s  || align="right"| 364 MHz
 
|-
 
| Skein-256-256 || Four unrolled Threefish rounds  || align="right"| 50 kGates || align="right"| 3558 Mbit/s  || align="right"| 264 MHz
 
 
|}
 
|}
  
<br />
 
<br />
 
  
 
+
----
=== All 14 Round-Two Candidates ===
 
 
 
Designs optimized towards throughput to area ratio. The cited results are those for the Xilinx Virtex 5 platform only. For a full listing of all ATHENa results refer to the [http://cryptography.gmu.edu/athena/ ATHENa webpage].
 
  
  
Line 949: Line 700:
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-  
 
|-  
| [http://www.springerlink.com/content/q41257x376615p22/ Gaj et al.] [[#Ref030|[30]]]  || N/A  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Xilinx Virtex 5
+
| [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]]  || N/A  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Altera Stratix III
 
|}
 
|}
  
Line 956: Line 707:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 +
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-256  || 4 G function units per iteration  || align="right"| 3635 ALUTs  || align="right"| 2072 Mbit/s  || align="right"| 119.0 MHz
 +
|- style="background:#ffdf9f;"
 +
| BLAKE-512  || 4 G function units per iteration  || align="right"| 7086 ALUTs  || align="right"| 2766 Mbit/s  || align="right"| 89.5 MHz
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-256  || P & Q permutations interleaved  || align="right"| 6350 ALUTs  || align="right"| 5380 Mbit/s  || align="right"| 220.7 MHz
 +
|- style="background:#ffdf9f;"
 +
| Grøstl-512  || P & Q permutations interleaved  || align="right"| 12355 ALUTs  || align="right"| 7142 Mbit/s  || align="right"| 202.3 MHz
 +
|- style="background:#ffdf9f;"
 +
| JH-256  ||  || align="right"| 3525 ALUTs  || align="right"| 4661 Mbit/s  || align="right"| 387.8 MHz
 +
|- style="background:#ffdf9f;"
 +
| JH-512  ||  || align="right"| 3709 ALUTs  || align="right"| 4696 Mbit/s  || align="right"| 390.6 MHz
 
|-
 
|-
| BLAKE-32 ||   || align="right"| 1851 slices || align="right"| 2610.6 Mbit/s  || align="right"| 102 MHz
+
| Keccak(-256) || || align="right"| 4213 ALUTs || align="right"| 12393 Mbit/s  || align="right"| 273.4 MHz
 
|-
 
|-
| Blue Midnight Wish-256 ||   || align="right"| 4400 slices || align="right"| 5576.7 Mbit/s  || align="right"| 10.9 MHz
+
| Keccak(-512) || || align="right"| 3979 ALUTs || align="right"| 7310 Mbit/s  || align="right"| 304.6 MHz
 
|-
 
|-
| CubeHash16/32-256  ||   || align="right"| 730 slices || align="right"| 3189.8 Mbit/s  || align="right"| 199.4 MHz
+
| Skein-512-256  || 4 Threefish rounds unrolled  || align="right"| 4645 ALUTs || align="right"| 2503 Mbit/s  || align="right"| 92.9 MHz
 
|-
 
|-
| ECHO-256  ||  || align="right"| 6453 slices  || align="right"| 10133.4 Mbit/s  || align="right"| 178.1 MHz
+
| Skein-512-512 || 4 Threefish rounds unrolled || align="right"| 4794 ALUTs || align="right"| 2434 Mbit/s  || align="right"| 90.3 MHz
|-
+
 
| Fugue-256  ||  || align="right"| 956 slices  || align="right"| 3151.2 Mbit/s  || align="right"| 98.5 MHz
 
|-
 
| Grøstl-256  ||  || align="right"| 1884 slices  || align="right"| 8676.5 Mbit/s  || align="right"| 355.9 MHz
 
|-
 
| Hamsi-256  ||  || align="right"| 946 slices  || align="right"| 2646.2 Mbit/s  || align="right"| 248.1 MHz
 
|-
 
| JH-256  ||  || align="right"| 1275 slices  || align="right"| 4013.5 Mbit/s  || align="right"| 282.2 MHz
 
|-
 
| Keccak(-256)  ||  || align="right"| 1229 slices  || align="right"| 10806.5 Mbit/s || align="right"| 238.4 MHz
 
|-
 
| Luffa-256 ||  || align="right"| 1154 slices || align="right"| 8008 Mbit/s  || align="right"| 281.5 MHz
 
|-
 
| Shabal-256  ||  || align="right"| 1266 slices  || align="right"| 2624 Mbit/s  || align="right"| 128.1 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  ||  || align="right"| 1130 slices  || align="right"| 2885.9 Mbit/s  || align="right"| 208.6 MHz
 
|-
 
| SIMD-256  ||  || align="right"| 9288 slices  || align="right"| 2325.9 Mbit/s  || align="right"| 40.9 MHz
 
|-
 
| Skein-256-256  ||  || align="right"| 1312 slices  || align="right"| 1416.1 Mbit/s  || align="right"| 49.8 MHz
 
 
|}
 
|}
  
Line 989: Line 734:
 
<br />
 
<br />
  
 
+
=== All 5 Round-Three Candidates ===
=== All 14 Round-Two Candidates ===
 
  
 
Results are without wrapper for long messages.
 
Results are without wrapper for long messages.
Line 1,006: Line 750:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32  ||  || align="right"| 1118 slices  || align="right"| 1169 Mbit/s  || align="right"| 118.06 MHz
+
| BLAKE-256  ||  || align="right"| 1118 slices  || align="right"| 835 Mbit/s  || align="right"| 118.06 MHz
|-
+
|- style="background:#ffdf9f;"
| BLAKE-64  ||  || align="right"| 1718 slices  || align="right"| 1299 Mbit/s  || align="right"| 90.91 MHz
+
| BLAKE-512  ||  || align="right"| 1718 slices  || align="right"| 1137 Mbit/s  || align="right"| 90.91 MHz
|-
+
|- style="background:#ffdf9f;"
| Blue Midnight Wish-256  ||  || align="right"| 4997 slices  || align="right"| 457 Mbit/s  || align="right"| 14.02 MHz
 
|-
 
| Blue Midnight Wish-512  ||  || align="right"| 9810 slices  || align="right"| 287 Mbit/s  || align="right"| 10 MHz
 
|-
 
| CubeHash8/32  ||  || align="right"| 695 slices  || align="right"| 2509 Mbit/s  || align="right"| 166.83 MHz
 
|-
 
| ECHO-256  ||  || align="right"| 7372 slices  || align="right"| 5373 Mbit/s  || align="right"| 198.93 MHz
 
|-
 
| ECHO-512 ||  || align="right"| 8633 slices  || align="right"| 18133 Mbit/s  || align="right"| 166.69 MHz
 
|-
 
| Fugue-256 ||  || align="right"| 1689 slices  || align="right"| 914 Mbit/s  || align="right"| 200.04 MHz
 
|-
 
| Fugue-384  ||  || align="right"| 2380 slices  || align="right"| 640 Mbit/s  || align="right"| 200.08 MHz
 
|-
 
| Fugue-512  ||  || align="right"| 2596 slices  || align="right"| 481 Mbit/s  || align="right"| 200.16 MHz
 
|-
 
 
| Grøstl-256  ||  || align="right"| 2391 slices  || align="right"| 3242 Mbit/s  || align="right"| 101.32 MHz
 
| Grøstl-256  ||  || align="right"| 2391 slices  || align="right"| 3242 Mbit/s  || align="right"| 101.32 MHz
|-
+
|- style="background:#ffdf9f;"
 
| Grøstl-512  ||  || align="right"| 4845 slices  || align="right"| 3619 Mbit/s  || align="right"| 123.4 MHz
 
| Grøstl-512  ||  || align="right"| 4845 slices  || align="right"| 3619 Mbit/s  || align="right"| 123.4 MHz
|-
+
|- style="background:#ffdf9f;"
| Hamsi-256  ||  || align="right"| 1518 slices  || align="right"| 358 Mbit/s  || align="right"| 72.41 MHz
+
| JH  ||  || align="right"| 1291 slices  || align="right"| 1641 Mbit/s  || align="right"| 250.13 MHz
|-
 
| Hamsi-512  ||  || align="right"| 6229 slices  || align="right"| 79 Mbit/s  || align="right"| 16.51 MHz
 
|-
 
| JH  ||  || align="right"| 1291 slices  || align="right"| 1941 Mbit/s  || align="right"| 250.13 MHz
 
 
|-
 
|-
 
| Keccak(-224)  ||  || align="right"| 1117 slices  || align="right"| 5915 Mbit/s  || align="right"| 189 MHz
 
| Keccak(-224)  ||  || align="right"| 1117 slices  || align="right"| 5915 Mbit/s  || align="right"| 189 MHz
Line 1,044: Line 768:
 
|-
 
|-
 
| Keccak(-512)  ||  || align="right"| 1117 slices  || align="right"| 8518 Mbit/s  || align="right"| 189 MHz
 
| Keccak(-512)  ||  || align="right"| 1117 slices  || align="right"| 8518 Mbit/s  || align="right"| 189 MHz
|-
 
| Luffa-256  ||  || align="right"| 2221 slices  || align="right"| 5333 Mbit/s  || align="right"| 166.67 MHz
 
|-
 
| Luffa-384  ||  || align="right"| 3740 slices  || align="right"| 5336 Mbit/s  || align="right"| 166.75 MHz
 
|-
 
| Luffa-512  ||  || align="right"| 3700 slices  || align="right"| 5336 Mbit/s  || align="right"| 166.75 MHz
 
|-
 
| Shabal  ||  || align="right"| 1583 slices  || align="right"| 1469 Mbit/s  || align="right"| 148.04 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  ||  || align="right"| 3125 slices  || align="right"| 1170 Mbit/s  || align="right"| 109.17 MHz
 
|-
 
| SHAvite-3<sub>512</sub>  ||  || align="right"| 9775 slices  || align="right"| 931 Mbit/s  || align="right"| 59.4 MHz
 
|-
 
| SIMD-256  ||  || align="right"| 22704 slices  || align="right"| 1338 Mbit/s  || align="right"| 107.2 MHz
 
|-
 
| SIMD-512  ||  || align="right"| 43729 slices  || align="right"| 2677 Mbit/s  || align="right"| 107.2 MHz
 
 
|-
 
|-
 
| Skein-512  ||  || align="right"| 1786 slices  || align="right"| 1945 Mbit/s  || align="right"| 83.65 MHz
 
| Skein-512  ||  || align="right"| 1786 slices  || align="right"| 1945 Mbit/s  || align="right"| 83.65 MHz
Line 1,067: Line 775:
 
<br />
 
<br />
  
 
+
=== All 5 Round-Three Candidates ===
=== All 14 Round-Two Candidates ===
 
  
 
Results include throughputs without interface overhead.
 
Results include throughputs without interface overhead.
Line 1,084: Line 791:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32  ||  || align="right"| 1660 slices  || align="right"| 2676 Mbit/s  || align="right"| 115 MHz
+
| BLAKE-256  ||  || align="right"| 1660 slices  || align="right"| 1911 Mbit/s  || align="right"| 115 MHz
|-
+
|- style="background:#ffdf9f;"
| Blue Midnight Wish-256  ||  || align="right"| 4350 slices  || align="right"| 8704 Mbit/s  || align="right"| 34 MHz
 
|-
 
| CubeHash16/32-256  ||  || align="right"| 590 slices  || align="right"| 2960 Mbit/s  || align="right"| 185 MHz
 
|-
 
| ECHO-256  ||  || align="right"| 2827 slices  || align="right"| 2312 Mbit/s  || align="right"| 149 MHz
 
|-
 
| Fugue-256  ||  || align="right"| 4013 slices  || align="right"| 1248 Mbit/s  || align="right"| 78 MHz
 
|-
 
 
| Grøstl-256  ||  || align="right"| 2616 slices  || align="right"| 7885 Mbit/s  || align="right"| 154 MHz
 
| Grøstl-256  ||  || align="right"| 2616 slices  || align="right"| 7885 Mbit/s  || align="right"| 154 MHz
|-
+
|- style="background:#ffdf9f;"
| Hamsi-256  ||  || align="right"| 718 slices  || align="right"| 1680 Mbit/s  || align="right"| 210 MHz
+
| JH-256  ||  || align="right"| 2661 slices  || align="right"| 2231 Mbit/s  || align="right"| 201 MHz
|-
 
| JH-256  ||  || align="right"| 2661 slices  || align="right"| 2639 Mbit/s  || align="right"| 201 MHz
 
 
|-
 
|-
 
| Keccak(-256)  ||  || align="right"| 1433 slices  || align="right"| 8397 Mbit/s  || align="right"| 205 MHz
 
| Keccak(-256)  ||  || align="right"| 1433 slices  || align="right"| 8397 Mbit/s  || align="right"| 205 MHz
|-
 
| Luffa-256  ||  || align="right"| 1048 slices  || align="right"| 7424 Mbit/s  || align="right"| 261 MHz
 
|-
 
| Shabal-256  ||  || align="right"| 1251 slices  || align="right"| 2335 Mbit/s  || align="right"| 228 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  ||  || align="right"| 1063 slices  || align="right"| 3382 Mbit/s  || align="right"| 251 MHz
 
|-
 
| SIMD-256  ||  || align="right"| 3987 slices  || align="right"| 835 Mbit/s  || align="right"| 75 MHz
 
 
|-
 
|-
 
| Skein-256-256  ||  || align="right"| 854 slices  || align="right"| 1402 Mbit/s  || align="right"| 115 MHz
 
| Skein-256-256  ||  || align="right"| 854 slices  || align="right"| 1402 Mbit/s  || align="right"| 115 MHz
Line 1,132: Line 821:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32  ||  || align="right"| 37 kGates  || align="right"| 6668 Mbit/s  || align="right"| 286.5 MHz
+
| BLAKE-256  ||  || align="right"| 37 kGates  || align="right"| 4763 Mbit/s  || align="right"| 286.5 MHz
|-
+
|- style="background:#ffdf9f;"
| Blue Midnight Wish-256  ||  || align="right"| 128.7 kGates  || align="right"| 25937 Mbit/s  || align="right"| 101.3 MHz
 
|-
 
| CubeHash16/32-256  ||  || align="right"| 35.5 kGates  || align="right"| 8247 Mbit/s  || align="right"| 515.5 MHz
 
|-
 
| ECHO-256  ||  || align="right"| 101.1 kGates  || align="right"| 5621 Mbit/s  || align="right"| 362.3 MHz
 
|-
 
| Fugue-256  ||  || align="right"| 56.7 kGates  || align="right"| 2721 Mbit/s  || align="right"| 170.1 MHz
 
|-
 
 
| Grøstl-256  ||  || align="right"| 139.1 kGates  || align="right"| 17297 Mbit/s  || align="right"| 337.8 MHz
 
| Grøstl-256  ||  || align="right"| 139.1 kGates  || align="right"| 17297 Mbit/s  || align="right"| 337.8 MHz
|-
+
|- style="background:#ffdf9f;"
| Hamsi-256  ||  || align="right"| 67.6 kGates  || align="right"| 7767 Mbit/s  || align="right"| 970.9 MHz
+
| JH-256  ||  || align="right"| 54.6 kGates  || align="right"| 8471 Mbit/s  || align="right"| 763.4 MHz
|-
 
| JH-256  ||  || align="right"| 54.6 kGates  || align="right"| 10022 Mbit/s  || align="right"| 763.4 MHz
 
 
|-
 
|-
 
| Keccak(-256)  ||  || align="right"| 50.7 kGates  || align="right"| 33333 Mbit/s  || align="right"| 781.3 MHz
 
| Keccak(-256)  ||  || align="right"| 50.7 kGates  || align="right"| 33333 Mbit/s  || align="right"| 781.3 MHz
|-
 
| Luffa-256  ||  || align="right"| 39.6 kGates  || align="right"| 28732 Mbit/s  || align="right"| 1010.1 MHz
 
|-
 
| Shabal-256  ||  || align="right"| 34.6 kGates  || align="right"| 6059 Mbit/s  || align="right"| 591.7 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  ||  || align="right"| 59.4 kGates  || align="right"| 8421 Mbit/s  || align="right"| 625 MHz
 
|-
 
| SIMD-256  ||  || align="right"| 139 kGates  || align="right"| 3171 Mbit/s  || align="right"| 284.9 MHz
 
 
|-
 
|-
 
| Skein-256-256  ||  || align="right"| 43.1 kGates  || align="right"| 3295 Mbit/s  || align="right"| 270.3 MHz
 
| Skein-256-256  ||  || align="right"| 43.1 kGates  || align="right"| 3295 Mbit/s  || align="right"| 270.3 MHz
Line 1,165: Line 836:
 
<br />
 
<br />
  
=== Blue Midnight Wish, Keccak, Luffa ===
+
=== All 5 Round-Three Candidates ===
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-
 
| [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]]  || N/A  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Xilinx Spartan 3
 
|}
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
|-
 
| Blue Midnight Wish  || Compression function with f0, f1, and f2 unrolled in sequence  || align="right"| 10531 slices  || align="right"| 2110 Mbit/s  || align="right"| 4.22 MHz
 
|-
 
| Keccak  || One Keccak-f round per cycle  || align="right"| 2024 slices  || align="right"| 3460 Mbit/s  || align="right"| 81.4 MHz
 
|-
 
| Luffa  || Three step modules  || align="right"| 2956 slices  || align="right"| 1480 Mbit/s  || align="right"| 157.3 MHz
 
|}
 
 
 
 
 
----
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-
 
| [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]]  || N/A  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Xilinx Virtex-II
 
|}
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
|-
 
| Blue Midnight Wish  || Compression function with f0, f1, and f2 unrolled in sequence  || align="right"| 10432 slices  || align="right"| 3360 Mbit/s  || align="right"| 6.71 MHz
 
|-
 
| Keccak  || One Keccak-f round per cycle  || align="right"| 2024 slices  || align="right"| 5810 Mbit/s  || align="right"| 136.6 MHz
 
|-
 
| Luffa  || Three step modules  || align="right"|2952  slices  || align="right"| 8370 Mbit/s  || align="right"| 301.4 MHz
 
|}
 
 
 
 
 
----
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-
 
| [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]]  || N/A  || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Xilinx Virtex 4
 
|}
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
|-
 
| Blue Midnight Wish  || Compression function with f0, f1, and f2 unrolled in sequence  || align="right"| 10486 slices  || align="right"| 4510 Mbit/s  || align="right"| 9.01 MHz
 
|-
 
| Keccak  || One Keccak-f round per cycle  || align="right"| 2024 slices  || align="right"| 6070 Mbit/s  || align="right"| 142.9 MHz
 
|-
 
| Luffa  || Three step modules  || align="right"| 2989 slices  || align="right"| 8560 Mbit/s  || align="right"| 308.2 MHz
 
|}
 
 
 
 
 
----
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-
 
| [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]]  || N/A  || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]]  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Synopsys 90 nm
 
|}
 
 
 
 
 
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
|-
 
| Blue Midnight Wish  || Compression function with f0, f1, and f2 unrolled in sequence  || align="right"| 55.9 kGates  || align="right"| 26320 Mbit/s  || align="right"| 52.63 MHz
 
|-
 
| Keccak  || One Keccak-f round per cycle  || align="right"| 10.5 kGates  || align="right"| 19320 Mbit/s  || align="right"| 454.5 MHz
 
|-
 
| Luffa  || Three step modules  || align="right"| 11.5 kGates  || align="right"| 21370 Mbit/s  || align="right"| 769.2 MHz
 
|}
 
 
 
=== All 14 Round-Two Candidates ===
 
  
 
Results are post-P&amp;R and include throughputs without interface overhead.
 
Results are post-P&amp;R and include throughputs without interface overhead.
Line 1,264: Line 845:
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
! width="200"| Reference  !! width="120"| HDL  !! width="120"| Category  !! width="100"| Impl. Scope  !! width="120"| Technology
 
|-  
 
|-  
| [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]]  || N/A || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || UMC 0.13 µm
+
| [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]]  || [http://rijndael.ece.vt.edu/sha3/ VT webpage] || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || UMC 0.13 µm
 
|}
 
|}
  
Line 1,271: Line 852:
 
|- style="background:#efefef;"
 
|- style="background:#efefef;"
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
 
! width="140"| Hash Function Name  !! width="270"| Impl. Details  !! width="90"| Size  !! width="80"| Throughput  !!  width="80"| Clock Frequency
|-
+
|- style="background:#ffdf9f;"
| BLAKE-32 ||  || align="right"| 43.52 kGates  || align="right"| 4645 Mbit/s  || align="right"| 200 MHz
+
| BLAKE-256 ||  || align="right"| 43.52 kGates  || align="right"| 3318 Mbit/s  || align="right"| 200 MHz
|-
+
|- style="background:#ffdf9f;"
| Blue Midnight Wish-256  ||  || align="right"| 198.17 kGates  || align="right"| 12220 Mbit/s  || align="right"| 48 MHz
 
|-
 
| CubeHash16/32-256  ||  || align="right"| 38.18 kGates  || align="right"| 4624 Mbit/s  || align="right"| 289 MHz
 
|-
 
| ECHO-256  ||  || align="right"| 92.73 kGates  || align="right"| 3366 Mbit/s  || align="right"| 217 MHz
 
|-
 
| Fugue-256  ||  || align="right"| 91.09 kGates  || align="right"| 2385 Mbit/s  || align="right"| 149 MHz
 
|-
 
 
| Grøstl-256  ||  || align="right"| 110.11 kGates  || align="right"| 9606 Mbit/s  || align="right"| 188 MHz
 
| Grøstl-256  ||  || align="right"| 110.11 kGates  || align="right"| 9606 Mbit/s  || align="right"| 188 MHz
|-
+
|- style="background:#ffdf9f;"
| Hamsi-256  ||  || align="right"| 29.94 kGates  || align="right"| 3571 Mbit/s  || align="right"| 446 MHz
+
| JH-256  ||  || align="right"| 62.42 kGates  || align="right"| 4334 Mbit/s  || align="right"| 391 MHz
|-
 
| JH-256  ||  || align="right"| 62.42 kGates  || align="right"| 5128 Mbit/s  || align="right"| 391 MHz
 
 
|-
 
|-
 
| Keccak(-256)  ||  || align="right"| 47.43 kGates  || align="right"| 15457 Mbit/s  || align="right"| 377 MHz
 
| Keccak(-256)  ||  || align="right"| 47.43 kGates  || align="right"| 15457 Mbit/s  || align="right"| 377 MHz
|-
 
| Luffa-256  ||  || align="right"| 37.94 kGates  || align="right"| 13943 Mbit/s  || align="right"| 490 MHz
 
|-
 
| Shabal-256  ||  || align="right"| 49.44 kGates  || align="right"| 2945 Mbit/s  || align="right"| 362 MHz
 
|-
 
| SHAvite-3<sub>256</sub>  ||  || align="right"| 55.25 kGates  || align="right"| 4599 Mbit/s  || align="right"| 341 MHz
 
|-
 
| SIMD-256  ||  || align="right"| 139.55 kGates  || align="right"| 2157 Mbit/s  || align="right"| 194 MHz
 
 
|-
 
|-
 
| Skein-256-256  ||  || align="right"| 40.9 kGates  || align="right"| 1941 Mbit/s  || align="right"| 159 MHz
 
| Skein-256-256  ||  || align="right"| 40.9 kGates  || align="right"| 1941 Mbit/s  || align="right"| 159 MHz
Line 1,411: Line 974:
  
 
<div id="Ref027">
 
<div id="Ref027">
[27] Shugo Mikami, Nagamasa Mizushima, Setsuko Nakamura, and Dai Watanabe. A Compact Hardware Implementation of SHA-3 Candidate Luffa. Available online at http://www.sdl.hitachi.co.jp/crypto/luffa/ACompactHardwareImplementationOfSHA-3CandidateLuffa_20100810.pdf.
+
[27] Shugo Mikami, Nagamasa Mizushima, Setsuko Nakamura, and Dai Watanabe. A Compact Hardware Implementation of SHA-3 Candidate Luffa (version 20101105). Available online at http://www.sdl.hitachi.co.jp/crypto/luffa/ACompactHardwareImplementationOfSHA-3CandidateLuffa_20101105.pdf.
 
</div>
 
</div>
  
Line 1,423: Line 986:
  
 
<div id="Ref030">
 
<div id="Ref030">
[30] Kris Gaj, Ekawat Homsirikamol, and Marcin Rogawski. Fair and Comprehensive Methodology for Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs. 12th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), 2010. Available online at http://www.springerlink.com/content/q41257x376615p22/.
+
[30] Ekawat Homsirikamol, Marcin Rogawski, and Kris Gaj. Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs. IACR Eprint report 2010/445. Available online at http://eprint.iacr.org/2010/445.pdf.
 
</div>
 
</div>
  
Line 1,456: Line 1,019:
 
<div id="Ref038">
 
<div id="Ref038">
 
[38] Akashi Satoh, Toshihiro Katashita, Takeshi Sugawara, Naofumi Homma, and Takafumi Aoki. Hardware Implementations of Hash Function Luffa. IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 2010. Available online at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5513102&tag=1.
 
[38] Akashi Satoh, Toshihiro Katashita, Takeshi Sugawara, Naofumi Homma, and Takafumi Aoki. Hardware Implementations of Hash Function Luffa. IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 2010. Available online at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5513102&tag=1.
 +
</div>
 +
 +
<div id="Ref039">
 +
[39] RCIS webpage (Other ASIC Implementations). http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html.
 +
</div>
 +
 +
<div id="Ref040">
 +
[40] Luca Henzen, Jean-Philippe Aumasson, Willi Meier, and Raphael C.-W. Phan. VLSI Characterization of the Cryptographic Hash Function BLAKE. IEEE T VLSI, 2010. Available online at http://131002.net/data/papers/HAMP10.pdf.
 +
</div>
 +
 +
<div id="Ref041">
 +
[41] Mohamed El Hadedy, Danilo Gligoroski, and Svein J. Knapskog. Single Core Implementation of Blue Midnight Wish Hash Function on VIRTEX 5 Platform. Available online at http://people.item.ntnu.no/~danilog/Hash/BMW-SecondRound/SmallSizeFPGA-BMWOct2010.pdf.
 +
</div>
 +
 +
<div id="Ref042">
 +
[42] Stéphanie Kerckhof, François Durvaux, Nicolas Veyrat-Charvillon, Francesco Regazzoni, Guerric Meurice de Dormale, François-Xavier Standaert. Compact FPGA Implementations of the Five SHA-3 Finalists. Available online at http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf.
 
</div>
 
</div>

Latest revision as of 10:49, 23 May 2012

1 Call for Contributions

Implementers (both submitters and non-submitters): You have results that complement this site? Let us know at sha3zoo-hardware@iaik.tugraz.at If you are making your HDL code available, please also provide us with according information.

2 Important Information

This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST (final round 3). This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our call for contributions.

A list of hardware implementations of the round 1 candidates can be found here. A list of hardware implementations of the round 2 candidates is archived here. Please note that the pages for round 1 and 2 candidates are provided for reference and will not be updated.

The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct comparisons between different hardware implementations difficult. The more of these parameters agree, the more reasonable the comparison becomes.

The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology).

In order to facilitate the comparison of hardware modules with different implementation scopes, we classify them into three categories:

For suggestions regarding the structure of this site, let us know at sha3zoo-hardware@iaik.tugraz.at

2.1 Fully Autonomous Implementation

HW type self-cont.jpg

Such hardware implementations include the complete functionality of a SHA-3 candidate (or a specific version thereof). That means the input message can be loaded piecewise into the hardware module and it delivers the message digest as output. All hash calculations happen exclusively within the hardware module. If integrated in a system, the achievable throughput of a fully autonomous implementation depends on the speed of the hardware module itself and the speed of the (system dependent) data interface delivering the input message.


2.2 Implementation with External Memory

HW type ext-mem.jpg

These implementations use external memory to hold intermediate values during the hashing of a message. The implemented hardware itself normally consists of the core logic functionality of the hash function, some registers for short-lived temporary values, and possible a memory controller for access to the external memory. Such implementations can load the input message either over a dedicated interface (similar to a fully autonomous implementation) or from the external memory. In order to reach the maximal throughput of the hardware module, the external memory must be sufficiently fast.


2.3 Implementation of Core Functionality

HW type core-funct.jpg

Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations.

3 Tweaks of Round Three Candidates over Round Two

The main tweaks for round three consist of the adaption of round numbers for some of the candidates. For implementations of round 2 variants (cf. round two results), we extrapolated to the performance of round 3 variants. Extrapolated results are marked in orange . If the tweaks for an algorithm are expected to be negligible for performance (e.g. just a change of constants), we include the results for the round 2 variant verbatim.

  • BLAKE: The round three versions of BLAKE have been renamed to BLAKE-224, BLAKE-256, BLAKE-384, and BLAKE-512. The number of rounds has been increased from 10 to 14 for BLAKE-224 and BLAKE-256, and from 14 to 16 for BLAKE-384 and BLAKE-512. Thus, throughput for BLAKE-224 and BLAKE-256 is expected to decrease by a factor of 10/14 (reduction by about 28.5%), and for BLAKE-384 and BLAKE-512 by a factor of 14/16 (reduction by 12.5%).
  • Grøstl: The shift distances for the Q permutation have been changed and the round constants for both P and Q permutation have been modified. The first is not expected to have an impact on hardware performance, whereas the latter is likely to increase overall hardware size and/or decrease throughput slightly.
  • JH: The number of rounds has been increased from 35.5 to 42. Thus, throughput of JH is expected to decrease by a factor of 35.5/42 (reduction by about 15.5%).
  • Keccak: The padding rule has been simplified and some parameters have been redefined. No significant impact on hardware performance is expected.
  • Skein: A single 64-bit constant has been changed. No significant impact on hardware performance is expected.

4 Ongoing Hardware Benchmarking Efforts

To describe it in the words of the initiators and maintainers: "ATHENa: Automated Tool for Hardware EvaluatioN is a project started at George Mason University, aimed at fair, comprehensive, and automated evaluation of cryptographic cores developed using hardware description languages, such as VHDL and Verilog." More information about the project and the current results can be found on the ATHENa webpage. Note: As each hash module submitted to ATHENAa is implemented on several FPGA platforms, the SHA-3 zoo pages will not replicate all results produced by the ATHENa project on this webpage. Instead please refer directly to the ATHENa webpage.

5 Summary of All Results

This section includes four categories of implementations (high-speed, low-area, both for FPGA and ASIC) which include known published results. If the HDL sourcecode is available, a link is provided as well.

5.1 High-Speed Implementations (FPGA)

Important note: The size and functionality of slices varies between FPGA families. A direct comparison of the slice count of implementations on different FPGA families is therefore problematic.

Hash Function Name Reference / HDL Impl. Scope Impl. Details Technology Size Throughput Clock Frequency
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex-II Pro 3091 slices 1231 Mbit/s 37.0 MHz
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 4 3087 slices 1596 Mbit/s 48.0 MHz
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 5 1694 slices 2216 Mbit/s 67.0 MHz
BLAKE-256 Namin and Hasan [2] / N/A Core functionality Compression function with 8 G function units and I/O registers Altera Stratix III 5435 ALUTs 1562 Mbit/s 46.97 MHz
BLAKE-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 1660 slices 1911 Mbit/s 115 MHz
BLAKE-256 Homsirikamol et al. [30] / On request Fully autonomous 4 G function units per iteration Xilinx Virtex 5 1523 slices 2245 Mbit/s 128.9 MHz
BLAKE-256 Homsirikamol et al. [30] / On request Fully autonomous 4 G function units per iteration Altera Stratix III 3635 ALUTs 2072 Mbit/s 119.0 MHz
BLAKE-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1118 slices 835 Mbit/s 118.06 MHz
BLAKE-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 1660 slices 1911 Mbit/s 115 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex-II Pro 11122 slices 1030 Mbit/s 17.0 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 4 11483 slices 1494 Mbit/s 25.0 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units Xilinx Virtex 5 4329 slices 2090 Mbit/s 35.0 MHz
BLAKE-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1718 slices 1137 Mbit/s 90.91 MHz
BLAKE-512 Homsirikamol et al. [30] / On request Fully autonomous 4 G function units per iteration Xilinx Virtex 5 3064 slices 3080 Mbit/s 99.7 MHz
BLAKE-512 Homsirikamol et al. [30] / On request Fully autonomous 4 G function units per iteration Altera Stratix III 7086 ALUTs 2766 Mbit/s 89.5 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous P & Q permutation in parallel Xilinx Spartan 3 6136 slices 4520 Mbit/s 88.3 MHz
Grøstl-224/256 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Virtex 5 1722 slices 10276 Mbit/s 200.7 MHz
Grøstl-224/256 Baldwin et al. [4] / N/A Core functionality P & Q permutation in parallel, S-box in BRAM Xilinx Spartan 3 4827 slices 3660 Mbit/s 71.53 MHz
Grøstl-224/256 Baldwin et al. [4] / N/A Core functionality P & Q permutation in parallel, S-box in BRAM Xilinx Virtex 5 4516 slices 7310 Mbit/s 142.87 MHz
Grøstl-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 4057 slices 5171 Mbit/s 101 MHz
Grøstl-256 Homsirikamol et al. [30] / On request Fully autonomous P & Q permutations interleaved Xilinx Virtex 5 1597 slices 7885 Mbit/s 323.4 MHz
Grøstl-256 Homsirikamol et al. [30] / On request Fully autonomous P & Q permutations interleaved Altera Stratix III 6350 ALUTs 5380 Mbit/s 220.7 MHz
Grøstl-256 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 2391 slices 3242 Mbit/s 101.32 MHz
Grøstl-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 2616 slices 7885 Mbit/s 154 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Spartan 3 20233 slices 5901 Mbit/s 80.7 MHz
Grøstl-384/512 Baldwin et al. [4] / N/A Core functionality P & Q permutation parallel, S-box in LUTs Xilinx Spartan 3 17452 slices 3180 Mbit/s 79.61 MHz
Grøstl-384/512 Baldwin et al. [4] / N/A Core functionality P & Q permutation parallel, S-box in LUTs Xilinx Virtex 5 19161 slices 6090 Mbit/s 83.33 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel Xilinx Virtex 5 5419 slices 15395 Mbit/s 210.5 MHz
Grøstl-384/512 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation Xilinx Spartan 3 8308 slices 3474 Mbit/s 95 MHz
Grøstl-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 4845 slices 3619 Mbit/s 123.4 MHz
Grøstl-512 Homsirikamol et al. [30] / On request Fully autonomous P & Q permutations interleaved Xilinx Virtex 5 3138 slices 10314 Mbit/s 292.1 MHz
Grøstl-512 Homsirikamol et al. [30] / On request Fully autonomous P & Q permutations interleaved Altera Stratix III 12355 ALUTs 7142 Mbit/s 202.3 MHz
JH-256 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1018 slices 4578 Mbit/s 380.8 MHz
JH-256 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 3525 ALUTs 4661 Mbit/s 387.8 MHz
JH-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 2661 slices 2231 Mbit/s 201 MHz
JH Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1291 slices 1641 Mbit/s 250.13 MHz
JH-512 Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1104 slices 4742 Mbit/s 394.5 MHz
JH-512 Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 3709 ALUTs 4696 Mbit/s 390.6 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Altera Cyclone III 5776 LEs 7500 Mbit/s 133 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Altera Stratix III 4713 ALUTs 12400 Mbit/s 218 MHz
Keccak J. Strömbergson [9] / Submission webpage Fully autonomous Core (round function, state register) only Xilinx Spartan 3A 3393 slices 4800 Mbit/s 85 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer Xilinx Virtex 5 1412 slices 6900 Mbit/s 122 MHz
Keccak(-224) Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1117 slices 5915 Mbit/s 189 MHz
Keccak(-256) Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1272 slices 12817 Mbit/s 282.7 MHz
Keccak(-256) Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 4213 ALUTs 12393 Mbit/s 273.4 MHz
Keccak(-256) Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1117 slices 6263 Mbit/s 189 MHz
Keccak(-256) Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 1433 slices 8397 Mbit/s 205 MHz
Keccak(-384) Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1117 slices 8190 Mbit/s 189 MHz
Keccak(-512) Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1117 slices 8518 Mbit/s 189 MHz
Keccak(-512) Homsirikamol et al. [30] / On request Fully autonomous Xilinx Virtex 5 1257 slices 6845 Mbit/s 285.2 MHz
Keccak(-512) Homsirikamol et al. [30] / On request Fully autonomous Altera Stratix III 3979 ALUTs 7310 Mbit/s 304.6 MHz
Keccak Akin et al. [34] / N/A Core functionality One Keccak-f round per cycle Xilinx Spartan 3 2024 slices 3460 Mbit/s 81.4 MHz
Keccak Akin et al. [34] / N/A Core functionality One Keccak-f round per cycle Xilinx Virtex-II 2024 slices 5810 Mbit/s 136.6 MHz
Keccak Akin et al. [34] / N/A Core functionality One Keccak-f round per cycle Xilinx Virtex 4 2024 slices 6070 Mbit/s 142.9 MHz
Skein-256-h Men Long [11] / N/A Core functionality UBI component Xilinx Virtex 5 1001 slices 408.7 Mbit/s 114.9 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Virtex 5 937 slices 1751 Mbit/s 68.4 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Spartan 3 2421 slices 669 Mbit/s 26.14 MHz
Skein-256-256 Kobayashi et al. [3] / RCIS webpage Fully autonomous Xilinx Virtex 5 854 slices 1482 Mbit/s 115 MHz
Skein-512-256 Homsirikamol et al. [30] / On request Fully autonomous 4 Threefish rounds unrolled Xilinx Virtex 5 1621 slices 3178 Mbit/s 118.0 MHz
Skein-512-256 Homsirikamol et al. [30] / On request Fully autonomous 4 Threefish rounds unrolled Altera Stratix III 4645 ALUTs 2503 Mbit/s 92.9 MHz
Skein-256-256 Matsuo et al. [33] / RCIS website Fully autonomous Xilinx Virtex 5 854 slices 1402 Mbit/s 115 MHz
Skein-512-h Men Long [11] / N/A Core functionality UBI component Xilinx Virtex 5 1877 slices 817.4 Mbit/s 114.9 MHz
Skein-512-512 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Virtex 5 1632 slices 3535 Mbit/s 69.04 MHz
Skein-512-512 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled Xilinx Spartan 3 4273 slices 1365 Mbit/s 26.66 MHz
Skein-512 Baldwin et al. [31] / UCC webpage Fully autonomous Xilinx Virtex 5 1786 slices 1945 Mbit/s 83.65 MHz
Skein-512-512 Homsirikamol et al. [30] / On request Fully autonomous 4 Threefish rounds unrolled Xilinx Virtex 5 1716 slices 3209 Mbit/s 119.1 MHz
Skein-512-512 Homsirikamol et al. [30] / On request Fully autonomous 4 Threefish rounds unrolled Altera Stratix III 4794 ALUTs 2434 Mbit/s 90.3 MHz



5.2 Low-Area Implementations (FPGA)

Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-256 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Spartan-3 124 slices 82 Mbit/s 190.0 MHz
BLAKE-256 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-4 124 slices 154 Mbit/s 357.0 MHz
BLAKE-256 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-5 56 slices 161 Mbit/s 372.0 MHz
BLAKE-256 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Altera Cyclone III 285 LEs 83 Mbit/s 192.0 MHz
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex-II Pro 958 slices 265 Mbit/s 59.0 MHz
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 4 960 slices 307 Mbit/s 68.0 MHz
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 5 390 slices 411 Mbit/s 91.0 MHz
BLAKE-256 Kerckhof et al. [42] / N/A Fully autonomous Rescheduled G function Xilinx Virtex 6 117 slices 105 Mbit/s 274.0 MHz
BLAKE-512 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Spartan-3 229 slices 121 Mbit/s 158.0 MHz
BLAKE-512 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-4 230 slices 192 Mbit/s 250.0 MHz
BLAKE-512 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Xilinx Virtex-5 108 slices 275 Mbit/s 358.0 MHz
BLAKE-512 Beuchat et al. [13] / N/A Fully autonomous Rescheduled G function Altera Cyclone III 542 LEs 108 Mbit/s 140.0 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex-II Pro 1802 slices 285 Mbit/s 36.0 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 4 1856 slices 333 Mbit/s 42.0 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with 1 G function unit Xilinx Virtex 5 939 slices 466 Mbit/s 59.0 MHz
BLAKE-512 Kerckhof et al. [42] / N/A Fully autonomous Rescheduled G function Xilinx Virtex 6 192 slices 183 Mbit/s 240.0 MHz
BLAKE-512 Kerckhof et al. [42] / N/A Fully autonomous Rescheduled G function Xilinx Spartan 6 230 slices 103 Mbit/s 135.0 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous 64-bit datapath, P & Q permutation in parallel Xilinx Spartan 3 2486 slices 404 Mbit/s 63.2 MHz
Grøstl-224/256 Jungk et al. [6] / N/A Fully autonomous 64-bit datapath, P & Q permutation in parallel Xilinx Virtex 2 Pro 2754 slices 512 Mbit/s 81.5 MHz
Grøstl-224/256 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation, S-Box based on composite field arithmetic Xilinx Spartan 3 1276 slices 192 Mbit/s 60 MHz
Grøstl-256 Kerckhof et al. [42] / N/A Fully autonomous Interleaved P & Q permutations Xilinx Virtex 6 260 slices 815 Mbit/s 280.0 MHz
Grøstl-384/512 Jungk and Reith [22] / N/A Fully autonomous Shared P & Q permutation, S-Box based on composite field arithmetic Xilinx Spartan 3 2110 slices 144 Mbit/s 63 MHz
Grøstl-512 Kerckhof et al. [42] / N/A Fully autonomous Interleaved P & Q permutations Xilinx Virtex 6 260 slices 640 Mbit/s 280.0 MHz
Grøstl-512 Kerckhof et al. [42] / N/A Fully autonomous Interleaved P & Q permutations Xilinx Spartan 6 343 slices 548 Mbit/s 240.0 MHz
JH-256 Kerckhof et al. [42] / N/A Fully autonomous 64-bit datapath & distributed RAMs Xilinx Virtex 6 240 slices 214 Mbit/s 288.0 MHz
JH-512 Kerckhof et al. [42] / N/A Fully autonomous 64-bit datapath & distributed RAMs Xilinx Virtex 6 240 slices 214 Mbit/s 288.0 MHz
JH-512 Kerckhof et al. [42] / N/A Fully autonomous 64-bit datapath & distributed RAMs Xilinx Spartan 6 260 slices 84 Mbit/s 113.0 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Altera Stratix III 855 ALUTs 96.8 Mbit/s 366 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Altera Cyclone III 1559 LEs 47.8 Mbit/s 181 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory Xilinx Virtex 5 444 slices 70.1 Mbit/s 265 MHz
Keccak(-256) Kerckhof et al. [42] / N/A Fully autonomous Rho transformation performed with a barrel rotator Xilinx Virtex 6 144 slices 128 Mbit/s 250.0 MHz
Keccak(-512) Kerckhof et al. [42] / N/A Fully autonomous Rho transformation performed with a barrel rotator Xilinx Virtex 6 144 slices 68 Mbit/s 250.0 MHz
Keccak(-512) Kerckhof et al. [42] / N/A Fully autonomous Rho transformation performed with a barrel rotator Xilinx Spartan 6 193 slices 45 Mbit/s 166.0 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality One round of Threefish iterated Altera Stratix III 1385 ALUTs 573.9 Mbit/s 161.42 MHz
Skein-512-256 Kerckhof et al. [42] / N/A Fully autonomous One round of Threefish iterated Xilinx Virtex 6 240 slices 179 Mbit/s 160.0 MHz
Skein-512-512 Kerckhof et al. [42] / N/A Fully autonomous One round of Threefish iterated Xilinx Virtex 6 240 slices 179 Mbit/s 160.0 MHz
Skein-512-512 Kerckhof et al. [42] / N/A Fully autonomous One round of Threefish iterated Xilinx Spartan 6 292 slices 102 Mbit/s 91.0 MHz



5.3 High-Speed Implementations (ASIC)


Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units UMC 0.18 µm 58.30 kGates 3782 Mbit/s 114 MHz
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with 4 G function units UMC 0.18 µm 41.31 kGates 2966 Mbit/s 170 MHz
BLAKE-256 Namin and Hasan [2] / N/A Core functionality Compression function with 8 G function units and I/O registers STM 90 nm 53 kGates 3196 Mbit/s(*) 96.15 MHz
BLAKE-256 Tillich et al. [14] / On request Fully autonomous Compression function with 4 G function units with CSAs UMC 0.18 µm 45.64 kGates 2836 Mbit/s 170.64 MHz
BLAKE-256 Henzen et al. [29] / ETH webpage Fully autonomous Four parallel G functions modules UMC 90 nm 47.5 kGates 6966 Mbit/s 400 MHz
BLAKE-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 43.52 kGates 3318 Mbit/s 200 MHz
BLAKE-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 37 kGates 4763 Mbit/s 286.5 MHz
BLAKE-256 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 0.18 µm 79 kGates 4548 Mbit/s 137 MHz
BLAKE-256 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 0.18 µm 48 kGates 4176 Mbit/s 240 MHz
BLAKE-256 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 0.13 µm 67 kGates 6689 Mbit/s 201 MHz
BLAKE-256 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 0.13 µm 43 kGates 5748 Mbit/s 330 MHz
BLAKE-256 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 90 nm 65 kGates 12499 Mbit/s 376 MHz
BLAKE-256 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 90 nm 38 kGates 10816 Mbit/s 621 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with 8 G function units UMC 0.18 µm 132.47 kGates 5171 Mbit/s 87 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with 4 G function units UMC 0.18 µm 82.73 kGates 4209 Mbit/s 136 MHz
BLAKE-512 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 0.18 µm 147 kGates 6314 Mbit/s 106 MHz
BLAKE-512 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 0.18 µm 98 kGates 6293 Mbit/s 204 MHz
BLAKE-512 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 0.13 µm 139 kGates 9452 Mbit/s 158 MHz
BLAKE-512 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 0.13 µm 92 kGates 8982 Mbit/s 291 MHz
BLAKE-512 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 8 G function units UMC 90 nm 128 kGates 17777 Mbit/s 298 MHz
BLAKE-512 Henzen et al. [40] / Submission webpage Fully autonomous Compression function with 4 G function units UMC 90 nm 79 kGates 16434 Mbit/s 532 MHz
Grøstl-256 Tillich et al. [14] / On request Fully autonomous One shared permutation for P & Q, one pipeline stage UMC 0.18 µm 58.40 kGates 6290 Mbit/s 270.27 MHz
Grøstl-256 Henzen et al. [29] / ETH webpage Fully autonomous P and Q permutation interleaved with one pipeline stage, S-box as LUT UMC 90 nm 135 kGates 16254 Mbit/s 667 MHz
Grøstl-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 110.11 kGates 9606 Mbit/s 188 MHz
Grøstl-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 139.1 kGates 17297 Mbit/s 337.8 MHz
Grøstl-256 RCIS webpage [39] / RCIS webpage Fully autonomous STM 90 nm 120.8 kGates 16275 Mbit/s 349.7 MHz
Grøstl-384/512 Submission doc. [7] / N/A Fully autonomous P & Q permutation in parallel UMC 0.18 µm 341 kGates 6225 Mbit/s 85.1 MHz
JH-256 Tillich et al. [14] / On request Fully autonomous 320 S-boxes, one round of R8 per cycle UMC 0.18 µm 58.83 kGates 4219 Mbit/s 380.22 MHz
JH-256 Henzen et al. [29] / ETH webpage Fully autonomous S-boxes as LUTs, stored constants UMC 90 nm 80 kGates 9134 Mbit/s 760 MHz
JH-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 62.42 kGates 4334 Mbit/s 391 MHz
JH-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 54.6 kGates 8471 Mbit/s 763.4 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Fully autonomous Core (round function, state register) & IO buffer ST 0.13 µm 48 kGates 29900 Mbit/s 526 MHz
Keccak Submission doc. [8] / Submission webpage Fully autonomous Core (round function, state register) only ST 0.13 µm 40 kGates 15000 Mbit/s 500 MHz
Keccak(-256) Tillich et al. [14] / On request Fully autonomous One instance of Keccak-f round UMC 0.18 µm 56.32 kGates 21229 Mbit/s 487.80 MHz
Keccak(-256) Henzen et al. [29] / ETH webpage Fully autonomous One round per cycle UMC 90 nm 50 kGates 43011 Mbit/s 949 MHz
Keccak(-256) Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 47.43 kGates 15457 Mbit/s 377 MHz
Keccak Akin et al. [34] / N/A Core functionality One Keccak-f round per cycle Synopsys 90 nm 10.5 kGates 19320 Mbit/s 454.5 MHz
Keccak(-256) RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 50.7 kGates 33333 Mbit/s 781.3 MHz
Keccak(-256) RCIS webpage [39] / RCIS webpage Fully autonomous STM 90 nm 55.9 kGates 43986 Mbit/s 1030.9 MHz
Skein-256-256 Stefan Tillich [12] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 53.87 kGates 1762 Mbit/s 68.8 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality All 72 Threefish rounds unrolled STM 90 nm 369 kGates 3126 Mbit/s(*) 12.21 MHz
Skein-256-256 Tillich et al. [14] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 58.61 kGates 1882 Mbit/s 73.52 MHz
Skein-256-256 Henzen et al. [29] / ETH webpage Fully autonomous Four unrolled Threefish rounds UMC 90 nm 50 kGates 3558 Mbit/s 264 MHz
Skein-256-256 Guo et al. [35] / VT webpage Fully autonomous UMC 0.13 µm 40.9 kGates 1941 Mbit/s 159 MHz
Skein-256-256 RCIS webpage [37] / RCIS webpage Fully autonomous STM 90 nm 43.1 kGates 3295 Mbit/s 270.3 MHz
Skein-512-512 Tillich et al. [14] / On request Fully autonomous 8 Threefish rounds unrolled UMC 0.18 µm 102.04 kGates 2502 Mbit/s 48.87 MHz
Skein-512 Walker et al. [36] / N/A] Fully autonomous 8 Threefish rounds unrolled Intel 32 nm 57.93 kGates 32320 Mbit/s 631.31 MHz

(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.



5.4 Low-Area Implementations (ASIC)

Hash Function Name Reference / HDL Impl. Scope Implementation Details Technology Size Throughput Clock Frequency
BLAKE-256 Tillich et al. [18] / N/A Fully autonomous One G function in 11 cycles AMS 0.35 µm 25.57 kGates 11 Mbit/s 31.25 MHz
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with a single G function unit UMC 0.18 µm 10.54 kGates 180.7 Mbit/s 40 MHz
BLAKE-256 Submission doc. [1] / Submission webpage Core functionality Compression function with a half G function unit UMC 0.18 µm 9.89 kGates 90.7 Mbit/s 40 MHz
BLAKE-256 Henzen et al. [40] / Submission webpage Fully autonomous 1 adder and 4-word latch array UMC 0.18 µm 13.56 kGates 96.4 Mbit/s 215 MHz
BLAKE-256 Henzen et al. [40] / Submission webpage Using external memory 1 adder and 4-word latch array UMC 0.18 µm 8.60 kGates 44.3 Mbit/s 100 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with a single G function unit UMC 0.18 µm 20.61 kGates 158.4 Mbit/s 20 MHz
BLAKE-512 Submission doc. [1] / Submission webpage Core functionality Compression function with a half G function unit UMC 0.18 µm 19.46 kGates 79.6 Mbit/s 20 MHz
Grøstl-224/256 Tillich et al. [18] / N/A Fully autonomous 64-bit datapath, P & Q permutation shared AMS 0.35 µm 14.62 kGates 145.9 Mbit/s 55.87 MHz
Grøstl-224/256 Grøstl website [19] / N/A Fully autonomous 64-bit datapath, P & Q permutation shared UMC 0.18 µm 17 kGates 645 Mbit/s 246.9 MHz
Grøstl-256 RCIS webpage [39] / RCIS webpage Fully autonomous STM 90 nm 34.8 kGates 2478 Mbit/s 101.6 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory ST 0.13 µm 6.5 kGates 176.4 Mbit/s(*) 666.7 MHz
Keccak Updated spec. (v1.2) [8] / Submission webpage Using external memory Small core using system memory, clock freq. limited to 200 MHz ST 0.13 µm 5 kGates 52.9 Mbit/s(**) 200 MHz
Skein-256-256 Tillich et al. [18] / N/A Fully autonomous 64-bit datapath AMS 0.35 µm 12.89 kGates 19.8 Mbit/s 80 MHz
Skein-256-256 Namin and Hasan [2] / N/A Core functionality One round of Threefish iterated STM 90 nm 21 kGates 1018.8 Mbit/s(***) 286.53 MHz

(*) Estimation for 64-bit memory interface: (1024 bits/permutation) * (666.7 * 10^6 cycles/s) / (3870 cycles/permutation) = 176.41 * 10^6 bits/s
(**) Estimation for 64-bit memory interface: (1024 bits/permutation) * (200 * 10^6 cycles/s) / (3870 cycles/permutation) = 52.92 * 10^6 bits/s
(***) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s



6 Comparative Studies

This section summarizes the reported results of publications which examined more than one round-three candidate in a similar setup.

6.1 BLAKE, Skein

Reference HDL Category Impl. Scope Technology
Namin and Hasan [2] N/A High-speed FPGA Core functionality Altera Stratix III


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 Compression function with 8 G function units and I/O registers 5435 ALUTs 1562 Mbit/s 46.97 MHz
Skein-256-256 All 72 Threefish rounds unrolled (device too small) N/A N/A N/A




Reference HDL Category Impl. Scope Technology
Namin and Hasan [2] N/A High-speed ASIC Core functionality STM 90 nm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 Compression function with 8 G function units and I/O registers 53 kGates 3196 Mbit/s(*) 96.15 MHz
Skein-256-256 All 72 Threefish rounds unrolled 369 kGates 3126 Mbit/s(*) 12.21 MHz

(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.



6.2 BLAKE, Grøstl, Skein

Reference HDL Category Impl. Scope Technology
Kobayashi et al. [3] RCIS webpage High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 1660 slices 1911 Mbit/s 115 MHz
Grøstl-256 4057 slices 5171 Mbit/s 101 MHz
Skein-256 854 slices 1482 Mbit/s 115 MHz



6.3 All 5 Round-Three Candidates

Reported results are post-synthesis. An interactive graphical comparison of various area-performance tradeoffs of this study can be found here.


Reference HDL Category Impl. Scope Technology
Tillich et al. [14] On request High-speed ASIC Fully autonomous UMC 0.18 µm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 Compression function with 4 G function units with CSAs 45.64 kGates 2836 Mbit/s 170.64 MHz
Grøstl-256 One shared permutation for P & Q, one pipeline stage 58.40 kGates 6290 Mbit/s 270.27 MHz
JH-256 320 S-boxes, one round of R8 per cycle 58.83 kGates 4219 Mbit/s 380.22 MHz
Keccak(-256) One instance of Keccak-f round 56.32 kGates 21229 Mbit/s 487.80 MHz
Skein-256-256 8 Threefish rounds unrolled 58.61 kGates 1882 Mbit/s 73.52 MHz
Skein-512-512 8 Threefish rounds unrolled 102.04 kGates 2502 Mbit/s 48.87 MHz



6.4 BLAKE, Grøstl, Skein

Reference HDL Category Impl. Scope Technology
Tillich et al. [18] N/A Low-area ASIC Fully autonomous AMS 0.35 µm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 One G function in 11 cycles 25.57 kGates 11 Mbit/s 31.25 MHz
Grøstl-224/256 64-bit datapath, P & Q permutation shared 14.62 kGates 145.9 Mbit/s 55.87 MHz
Skein-256-256 64-bit datapath 12.89 kGates 19.8 Mbit/s 80 MHz



6.5 All 5 Round-Three Candidates

Reported results of this study are post-P&R performances of designs targeting high throughput.


Reference HDL Category Impl. Scope Technology
Henzen et al. [29] ETH webpage High-speed ASIC Fully autonomous UMC 90 nm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 Four parallel G functions modules 47.5 kGates 6966 Mbit/s 400 MHz
Grøstl-256 P and Q permutation interleaved with one pipeline stage, S-box as LUT 135 kGates 16254 Mbit/s 667 MHz
JH-256 S-boxes as LUTs, stored constants 80 kGates 9134 Mbit/s 760 MHz
Keccak(-256) One round per cycle 50 kGates 43011 Mbit/s 949 MHz
Skein-256-256 Four unrolled Threefish rounds 50 kGates 3558 Mbit/s 264 MHz



6.6 All 5 Round-Three Candidates

Designs optimized towards throughput to area ratio. The cited results are those for the Xilinx Virtex 5 and Altera Stratix III platforms (both for the 256-bit and the 512-bit version of the candidates). Results marked with N/A did not fit into the largest device of the device family. For a full listing of all ATHENa results refer to the ATHENa webpage.


Reference HDL Category Impl. Scope Technology
Homsirikamol et al. [30] On request High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 4 G function units per iteration 1523 slices 2245 Mbit/s 128.9 MHz
BLAKE-512 4 G function units per iteration 3064 slices 3080 Mbit/s 99.7 MHz
Grøstl-256 P & Q permutations interleaved 1597 slices 7885 Mbit/s 323.4 MHz
Grøstl-512 P & Q permutations interleaved 3138 slices 10314 Mbit/s 292.1 MHz
JH-256 1018 slices 4578 Mbit/s 380.8 MHz
JH-512 1104 slices 4742 Mbit/s 394.5 MHz
Keccak(-256) 1272 slices 12817 Mbit/s 282.7 MHz
Keccak(-512) 1257 slices 6845 Mbit/s 285.2 MHz
Skein-512-256 4 Threefish rounds unrolled 1621 slices 3178 Mbit/s 118.0 MHz
Skein-512-512 4 Threefish rounds unrolled 1716 slices 3209 Mbit/s 119.1 MHz




Reference HDL Category Impl. Scope Technology
Homsirikamol et al. [30] N/A High-speed FPGA Fully autonomous Altera Stratix III


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 4 G function units per iteration 3635 ALUTs 2072 Mbit/s 119.0 MHz
BLAKE-512 4 G function units per iteration 7086 ALUTs 2766 Mbit/s 89.5 MHz
Grøstl-256 P & Q permutations interleaved 6350 ALUTs 5380 Mbit/s 220.7 MHz
Grøstl-512 P & Q permutations interleaved 12355 ALUTs 7142 Mbit/s 202.3 MHz
JH-256 3525 ALUTs 4661 Mbit/s 387.8 MHz
JH-512 3709 ALUTs 4696 Mbit/s 390.6 MHz
Keccak(-256) 4213 ALUTs 12393 Mbit/s 273.4 MHz
Keccak(-512) 3979 ALUTs 7310 Mbit/s 304.6 MHz
Skein-512-256 4 Threefish rounds unrolled 4645 ALUTs 2503 Mbit/s 92.9 MHz
Skein-512-512 4 Threefish rounds unrolled 4794 ALUTs 2434 Mbit/s 90.3 MHz



6.7 All 5 Round-Three Candidates

Results are without wrapper for long messages.


Reference HDL Category Impl. Scope Technology
Baldwin et al. [31] UCC webpage High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 1118 slices 835 Mbit/s 118.06 MHz
BLAKE-512 1718 slices 1137 Mbit/s 90.91 MHz
Grøstl-256 2391 slices 3242 Mbit/s 101.32 MHz
Grøstl-512 4845 slices 3619 Mbit/s 123.4 MHz
JH 1291 slices 1641 Mbit/s 250.13 MHz
Keccak(-224) 1117 slices 5915 Mbit/s 189 MHz
Keccak(-256) 1117 slices 6263 Mbit/s 189 MHz
Keccak(-384) 1117 slices 8190 Mbit/s 189 MHz
Keccak(-512) 1117 slices 8518 Mbit/s 189 MHz
Skein-512 1786 slices 1945 Mbit/s 83.65 MHz



6.8 All 5 Round-Three Candidates

Results include throughputs without interface overhead.


Reference HDL Category Impl. Scope Technology
Matsuo et al. [33] RCIS webpage High-speed FPGA Fully autonomous Xilinx Virtex 5


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 1660 slices 1911 Mbit/s 115 MHz
Grøstl-256 2616 slices 7885 Mbit/s 154 MHz
JH-256 2661 slices 2231 Mbit/s 201 MHz
Keccak(-256) 1433 slices 8397 Mbit/s 205 MHz
Skein-256-256 854 slices 1402 Mbit/s 115 MHz




Same implementations as in Matsuo et al. [33] implemented on STM 90 nm technology.


Reference HDL Category Impl. Scope Technology
RCIS webpage [37] RCIS webpage High-speed ASIC Fully autonomous STM 90 nm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 37 kGates 4763 Mbit/s 286.5 MHz
Grøstl-256 139.1 kGates 17297 Mbit/s 337.8 MHz
JH-256 54.6 kGates 8471 Mbit/s 763.4 MHz
Keccak(-256) 50.7 kGates 33333 Mbit/s 781.3 MHz
Skein-256-256 43.1 kGates 3295 Mbit/s 270.3 MHz



6.9 All 5 Round-Three Candidates

Results are post-P&R and include throughputs without interface overhead.


Reference HDL Category Impl. Scope Technology
Guo et al. [35] VT webpage High-speed ASIC Fully autonomous UMC 0.13 µm


Hash Function Name Impl. Details Size Throughput Clock Frequency
BLAKE-256 43.52 kGates 3318 Mbit/s 200 MHz
Grøstl-256 110.11 kGates 9606 Mbit/s 188 MHz
JH-256 62.42 kGates 4334 Mbit/s 391 MHz
Keccak(-256) 47.43 kGates 15457 Mbit/s 377 MHz
Skein-256-256 40.9 kGates 1941 Mbit/s 159 MHz



7 References

[1] Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan. SHA-3 proposal BLAKE (version 1.3). Available online at http://131002.net/blake/blake.pdf.

[2] A. H. Namin and M. A. Hasan. Hardware Implementation of the Compression Function for Selected SHA-3 Candidates. Available online at http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html.

[3] Kazuyuki Kobayashi, Jun Ikegami, Shin'ichiro Matsuo, Kazuo Sakiyama, and Kazuo Ohta. Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII. IACR Eprint report 2010/010. Available online at http://eprint.iacr.org/2010/010.pdf.

[4] Brian Baldwin, Andrew Byrne, Mark Hamilton, Neil Hanley, Robert P. McEvoy, Weibo Pan, and William P. Marnane. FPGA Implementations of SHA-3 Candidates: CubeHash, Grøstl, LANE, Shabal and Spectral Hash. IACR Eprint report 2009/342. Available online at http://eprint.iacr.org/2009/342.pdf.

[5] Liang Lu, Maire O'Neil, and Earl Swartzlander. Hardware Evaluation of SHA-3 Hash Function Candidate ECHO. Presentation at the Clauce Shannon Institute Workshop on Coding and Cryptography 2009. Slides available online at http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf.

[6] Bernhard Jungk, Steffen Reith, and Jürgen Apfelbeck. On Optimized FPGA Implementations of the SHA-3 Candidate Grøstl. IACR Eprint report 2009/206. Available online at http://eprint.iacr.org/2009/206.pdf.

[7] Praveen Gauravaram, Lars R. Knudsen, Krystian Matusievicz, Florian Mendel, Christian Rechberger, Martin Schläffer, and Søren S. Thomsen. Grøstl - a SHA-3 candidate (October 31, 2008). Available online at http://www.groestl.info/Groestl.pdf.

[8] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles van Assche. KECCAK sponge function family main document (Version 1.2, April 23, 2009). Available online at http://keccak.noekeon.org/Keccak-main-1.2.pdf.

[9] Joachim Strömbergson. Implementation of the Keccak Hash Function in FPGA Devices. Available online at http://www.strombergson.com/files/Keccak_in_FPGAs.pdf.

[10] Romain Feron and Julien Francq. FPGA Implementation of Shabal: Our First Results (Version 2.0, February 19, 2010). Available online at http://www.shabal.com/wp-content/uploads/2010/03/FPGA-Implementation-of-Shabal-First-ResultsV2.0.pdf.

[11] Men Long. Implementing Skein Hash Function on Xilinx Virtex-5 FPGA Platform (Version 0.7, February 2, 2009). Available online at http://www.skein-hash.info/sites/default/files/skein_fpga.pdf.

[12] Stefan Tillich. Hardware Implementation of the SHA-3 Candidate Skein. IACR Eprint report 2009/159. Available online at http://eprint.iacr.org/2009/159.pdf.

[13] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. Compact Implementations of BLAKE-32 and BLAKE-64 on FPGA. IACR Eprint report 2010/173. Available online at http://eprint.iacr.org/2010/173.pdf.

[14] Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, Jörn-Marc Schmidt, and Alexander Szekely. High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein. IACR Eprint report 2009/510. Available online at http://eprint.iacr.org/2009/510.pdf.

[15] Shai Halevi, William E. Hall, and Charanjit S. Jutla. The Hash Function Fugue (October 30, 2008). Available online at http://domino.research.ibm.com/comm/research_projects.nsf/pages/fugue.index.html/$FILE/NIST-submission-Oct08-fugue.pdf.

[16] Junfeng Fan. Hardware Evaluation of The Hash Function Hamsi. Available online at http://homes.esat.kuleuven.be/~okucuk/hamsi/implementations.html.

[17] Miroslav Knezevic and Ingrid Verbeiwhede. Hardware Evaluation of the Luffa Hash Family. 4th Workshop on Embedded Systems Security 2009. Available online at http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf.

[18] Stefan Tillich, Martin Feldhofer, Wolfgang Issovits, Thomas Kern, Hermann Kureck, Michael Mühlberghuber, Georg Neubauer, Andreas Reiter, Armin Köfler, and Mathias Mayrhofer. Compact Hardware Implementations of the SHA-3 Candidates ARIRANG, BLAKE, Grøstl, and Skein. IACR Eprint report 2009/349. Available online at http://eprint.iacr.org/2009/349.pdf.

[19] Grøstl website. http://www.groestl.info/.

[20] Markus Bernet, Luca Henzen, Hubert Kaeslin, Norbert Felber, and Wolfgang Fichtner. Hardware Implementations of the SHA-3 Candidates Shabal and CubeHash. 52nd IEEE International Midwest Symposium on Circuits and Systems, 2009. Available online at http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043.

[21] Michel Kinsy and Richard Uhler. SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec. Available online at http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf.

[22] Bernhard Jungk and Steffen Reith. On FPGA-based implementations of Grøstl. IACR Eprint report 2010/260. Available online at http://eprint.iacr.org/2010/260.pdf.

[23] Jérémie Detrey, Pierre Gaudry, and Karim Khalfallah. A Low-Area yet Performant FPGA Implementation of Shabal. IACR Eprint report 2010/292. Available online at http://eprint.iacr.org/2010/292.pdf.

[24] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. A Compact FPGA Implementation of the SHA-3 Candidate ECHO. IACR Eprint report 2010/364. Available online at http://eprint.iacr.org/2010/364.pdf.

[25] Wim Ramakers and Hans Narinx. Implementation and evaluation of SHA-3 candidates on FPGA. Extended abstract of Master Thesis "Implementatie en Evaluatie van SHA-3-Kandidaten op FPGA" (Dutch). Extended abstract available online at http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf. Full thesis available online at http://ehash.iaik.tugraz.at/uploads/6/62/Ramakers_Narinx2010ECHO-Hamsi-Luffa_Thesis_DUTCH.pdf.

[26] Julien Francq and Céline Thuillet. Unfolding Method for Shabal on Virtex-5 FPGAs: Concrete Results. IACR Eprint report 2010/406. Available online at http://eprint.iacr.org/2010/406.pdf.

[27] Shugo Mikami, Nagamasa Mizushima, Setsuko Nakamura, and Dai Watanabe. A Compact Hardware Implementation of SHA-3 Candidate Luffa (version 20101105). Available online at http://www.sdl.hitachi.co.jp/crypto/luffa/ACompactHardwareImplementationOfSHA-3CandidateLuffa_20101105.pdf.

[28] Imed Mabrouk and Ryad Benadjila. ECHO webpage (hardware subpage). http://crypto.rd.francetelecom.com/ECHO/hard/.

[29] Luca Henzen, Pietro Gendotti, Patrice Guillet, Enrico Pargaetzi, Martin Zoller, and Frank K. Gürkaynak. Developing a Hardware Evaluation Method for SHA-3 Candidates. 12th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), 2010. Available online at http://www.springerlink.com/content/g0115v3272156r06/.

[30] Ekawat Homsirikamol, Marcin Rogawski, and Kris Gaj. Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs. IACR Eprint report 2010/445. Available online at http://eprint.iacr.org/2010/445.pdf.

[31] Brian Baldwin, Neil Hanley, Mark Hamilton, Liang Lu, Andrew Byrne, Maire O'Neill, and William P. Marnane. FPGA Implementations of the Round Two SHA-3 Candidates. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf.

[32] Mohamed El Hadedy, Martin Margala, Danilo Gligoroski, and Svein J. Knapskog. Resource-Efficient Implementation of Blue Midnight Wish-256 Hash Function on Xilinx FPGA Platform. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/El-Hadedy_SmallSizeFPGA-BMW256.pdf.

[33] Shin'ichiro Matsuo, Miroslav Knezevic, Patrick Schaumont, Ingrid Verbauwhede, Akashi Satoh, Kazuo Sakiyama, and Kazuo Ota. How Can We Conduct "Fair and Consistent" Hardware Evaluation for SHA-3 Candidate? Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf.

[34] Abdulkadir Akin, Aydin Aysu, Onur Can Ulusel, and Erkay Savas. Efficient Hardware Implementations of High Throughput SHA-3 Candidates Keccak, Luffa and Blue Midnight Wish for Single- and Multi-Message Hashing. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf.

[35] Xu Guo, Sinan Huang, Leyla Nazhandali, and Patrick Schaumont. Fair and Comprehensive Performance Evaluation of 14 Second Round SHA-3 ASIC Implementations. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf.

[36] Jesse Walker, Farhana Sheikh, Sanu K. Mathew, and Ram Krishnamurthy. A Skein-512 Hardware Implementation. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/WALKER_skein-intel-hwd.pdf.

[38] Akashi Satoh, Toshihiro Katashita, Takeshi Sugawara, Naofumi Homma, and Takafumi Aoki. Hardware Implementations of Hash Function Luffa. IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 2010. Available online at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5513102&tag=1.

[39] RCIS webpage (Other ASIC Implementations). http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html.

[40] Luca Henzen, Jean-Philippe Aumasson, Willi Meier, and Raphael C.-W. Phan. VLSI Characterization of the Cryptographic Hash Function BLAKE. IEEE T VLSI, 2010. Available online at http://131002.net/data/papers/HAMP10.pdf.

[41] Mohamed El Hadedy, Danilo Gligoroski, and Svein J. Knapskog. Single Core Implementation of Blue Midnight Wish Hash Function on VIRTEX 5 Platform. Available online at http://people.item.ntnu.no/~danilog/Hash/BMW-SecondRound/SmallSizeFPGA-BMWOct2010.pdf.

[42] Stéphanie Kerckhof, François Durvaux, Nicolas Veyrat-Charvillon, Francesco Regazzoni, Guerric Meurice de Dormale, François-Xavier Standaert. Compact FPGA Implementations of the Five SHA-3 Finalists. Available online at http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf.