Difference between revisions of "SHA-3 Hardware Implementations"
m (→High-Speed Implementations (FPGA): Test for links to references section) |
m (→Important Information) |
||
(126 intermediate revisions by 2 users not shown) | |||
Line 6: | Line 6: | ||
== Important Information == | == Important Information == | ||
− | This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST. This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our [[#Call_for_Contributions|call for contributions]]. | + | This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST (final round 3). This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our [[#Call_for_Contributions|call for contributions]]. |
− | A list of hardware implementations of the round 1 candidates can be found [[SHA-3_Hardware_Implementations_Round_One|here]]. Please note that the | + | A list of hardware implementations of the round 1 candidates can be found [[SHA-3_Hardware_Implementations_Round_One|here]]. A list of hardware implementations of the round 2 candidates is archived [[SHA-3_Hardware_Implementations_Round_Two|here]]. <font color=red> Please note that the pages for round 1 and 2 candidates are provided for reference and will not be updated. </font> |
− | The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct | + | The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct comparisons between different hardware implementations difficult. The more of these parameters agree, the more reasonable the comparison becomes. |
The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology). | The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology). | ||
− | In order to facilitate the | + | In order to facilitate the comparison of hardware modules with different implementation scopes, we classify them into three categories: |
* [[#Fully_Autonomous_Implementation|Fully autonomous]] | * [[#Fully_Autonomous_Implementation|Fully autonomous]] | ||
Line 42: | Line 42: | ||
Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations. | Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations. | ||
− | == | + | == Tweaks of Round Three Candidates over Round Two == |
− | Important note: The size and functionality of slices varies between FPGA families. A direct | + | The main tweaks for round three consist of the adaption of round numbers for some of the candidates. For implementations of round 2 variants (cf. [[SHA-3_Hardware_Implementations_Round_Two|round two results]]), we extrapolated to the performance of round 3 variants. Extrapolated results are marked in <font color=orange> orange </font>. If the tweaks for an algorithm are expected to be negligible for performance (e.g. just a change of constants), we include the results for the round 2 variant verbatim. |
+ | |||
+ | * BLAKE: The round three versions of BLAKE have been renamed to BLAKE-224, BLAKE-256, BLAKE-384, and BLAKE-512. The number of rounds has been increased from 10 to 14 for BLAKE-224 and BLAKE-256, and from 14 to 16 for BLAKE-384 and BLAKE-512. Thus, throughput for BLAKE-224 and BLAKE-256 is expected to decrease by a factor of 10/14 (reduction by about 28.5%), and for BLAKE-384 and BLAKE-512 by a factor of 14/16 (reduction by 12.5%). | ||
+ | * Grøstl: The shift distances for the Q permutation have been changed and the round constants for both P and Q permutation have been modified. The first is not expected to have an impact on hardware performance, whereas the latter is likely to increase overall hardware size and/or decrease throughput slightly. | ||
+ | * JH: The number of rounds has been increased from 35.5 to 42. Thus, throughput of JH is expected to decrease by a factor of 35.5/42 (reduction by about 15.5%). | ||
+ | * Keccak: The padding rule has been simplified and some parameters have been redefined. No significant impact on hardware performance is expected. | ||
+ | * Skein: A single 64-bit constant has been changed. No significant impact on hardware performance is expected. | ||
+ | |||
+ | == Ongoing Hardware Benchmarking Efforts == | ||
+ | |||
+ | To describe it in the words of the initiators and maintainers: "ATHENa: Automated Tool for Hardware EvaluatioN is a project started at George Mason University, aimed at fair, comprehensive, and automated evaluation of cryptographic cores developed using hardware description languages, such as VHDL and Verilog." More information about the project and the current results can be found on the [http://cryptography.gmu.edu/athena/ ATHENa webpage]. Note: As each hash module submitted to ATHENAa is implemented on several FPGA platforms, the SHA-3 zoo pages will not replicate all results produced by the ATHENa project on this webpage. Instead please refer directly to the [http://cryptography.gmu.edu/athena/ ATHENa webpage]. | ||
+ | |||
+ | == Summary of All Results == | ||
+ | |||
+ | This section includes four categories of implementations (high-speed, low-area, both for FPGA and ASIC) which include known published results. If the HDL sourcecode is available, a link is provided as well. | ||
+ | |||
+ | === High-Speed Implementations (FPGA) === | ||
+ | |||
+ | Important note: The size and functionality of slices varies between FPGA families. A direct comparison of the slice count of implementations on different FPGA families is therefore problematic. | ||
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
|- style="background:#efefef;" | |- style="background:#efefef;" | ||
! width="120"| Hash Function Name !! width="150"| Reference / HDL !! width="100"| Impl. Scope !! width="200"| Impl. Details !! width="100"| Technology !! width="80"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ! width="120"| Hash Function Name !! width="150"| Reference / HDL !! width="100"| Impl. Scope !! width="200"| Impl. Details !! width="100"| Technology !! width="80"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units || Xilinx Virtex-II Pro || align="right"| 3091 slices || align="right"| 1231 Mbit/s || align="right"| 37.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units || Xilinx Virtex 4 || align="right"| 3087 slices || align="right"| 1596 Mbit/s || align="right"| 48.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units || Xilinx Virtex 5 || align="right"| 1694 slices || align="right"| 2216 Mbit/s || align="right"| 67.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units and I/O registers || Altera Stratix III || align="right"| 5435 ALUTs || align="right"| 1562 Mbit/s || align="right"| 46.97 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1660 slices || align="right"| 1911 Mbit/s || align="right"| 115 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 4 G function units per iteration || Xilinx Virtex 5 || align="right"| 1523 slices || align="right"| 2245 Mbit/s || align="right"| 128.9 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 4 G function units per iteration || Altera Stratix III || align="right"| 3635 ALUTs || align="right"| 2072 Mbit/s || align="right"| 119.0 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1118 slices || align="right"| 835 Mbit/s || align="right"| 118.06 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1660 slices || align="right"| 1911 Mbit/s || align="right"| 115 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units || Xilinx Virtex-II Pro || align="right"| 11122 slices || align="right"| 1030 Mbit/s || align="right"| 17.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units || Xilinx Virtex 4 || align="right"| 11483 slices || align="right"| 1494 Mbit/s || align="right"| 25.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units || Xilinx Virtex 5 || align="right"| 4329 slices || align="right"| 2090 Mbit/s || align="right"| 35.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1718 slices || align="right"| 1137 Mbit/s || align="right"| 90.91 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 4 G function units per iteration || Xilinx Virtex 5 || align="right"| 3064 slices || align="right"| 3080 Mbit/s || align="right"| 99.7 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 4 G function units per iteration || Altera Stratix III || align="right"| 7086 ALUTs || align="right"| 2766 Mbit/s || align="right"| 89.5 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutation in parallel || Xilinx Spartan 3 || align="right"| 6136 slices || align="right"| 4520 Mbit/s || align="right"| 88.3 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutation in parallel || Xilinx Virtex 5 || align="right"| 1722 slices || align="right"| 10276 Mbit/s || align="right"| 200.7 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || P & Q permutation in parallel, S-box in BRAM || Xilinx Spartan 3 || align="right"| 4827 slices || align="right"| 3660 Mbit/s || align="right"| 71.53 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || P & Q permutation in parallel, S-box in BRAM || Xilinx Virtex 5 || align="right"| 4516 slices || align="right"| 7310 Mbit/s || align="right"| 142.87 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 4057 slices || align="right"| 5171 Mbit/s || align="right"| 101 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutations interleaved || Xilinx Virtex 5 || align="right"| 1597 slices || align="right"| 7885 Mbit/s || align="right"| 323.4 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutations interleaved || Altera Stratix III || align="right"| 6350 ALUTs || align="right"| 5380 Mbit/s || align="right"| 220.7 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 2391 slices || align="right"| 3242 Mbit/s || align="right"| 101.32 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 2616 slices || align="right"| 7885 Mbit/s || align="right"| 154 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-384/512 || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutation in parallel || Xilinx Spartan 3 || align="right"| 20233 slices || align="right"| 5901 Mbit/s || align="right"| 80.7 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-384/512 || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || P & Q permutation parallel, S-box in LUTs || Xilinx Spartan 3 || align="right"| 17452 slices || align="right"| 3180 Mbit/s || align="right"| 79.61 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-384/512 || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] [[#Ref004|[4]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || P & Q permutation parallel, S-box in LUTs || Xilinx Virtex 5 || align="right"| 19161 slices || align="right"| 6090 Mbit/s || align="right"| 83.33 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-384/512 || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutation in parallel || Xilinx Virtex 5 || align="right"| 5419 slices || align="right"| 15395 Mbit/s || align="right"| 210.5 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-384/512 || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Shared P & Q permutation || Xilinx Spartan 3 || align="right"| 8308 slices || align="right"| 3474 Mbit/s || align="right"| 95 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-512 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 4845 slices || align="right"| 3619 Mbit/s || align="right"| 123.4 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutations interleaved || Xilinx Virtex 5 || align="right"| 3138 slices || align="right"| 10314 Mbit/s || align="right"| 292.1 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutations interleaved || Altera Stratix III || align="right"| 12355 ALUTs || align="right"| 7142 Mbit/s || align="right"| 202.3 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1018 slices || align="right"| 4578 Mbit/s || align="right"| 380.8 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Altera Stratix III || align="right"| 3525 ALUTs || align="right"| 4661 Mbit/s || align="right"| 387.8 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 2661 slices || align="right"| 2231 Mbit/s || align="right"| 201 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1291 slices || align="right"| 1641 Mbit/s || align="right"| 250.13 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1104 slices || align="right"| 4742 Mbit/s || align="right"| 394.5 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Altera Stratix III || align="right"| 3709 ALUTs || align="right"| 4696 Mbit/s || align="right"| 390.6 MHz | ||
+ | |||
|- | |- | ||
− | | | + | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Core (round function, state register) & IO buffer || Altera Cyclone III || align="right"| 5776 LEs || align="right"| 7500 Mbit/s || align="right"| 133 MHz |
|- | |- | ||
− | | | + | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Core (round function, state register) & IO buffer || Altera Stratix III || align="right"| 4713 ALUTs || align="right"| 12400 Mbit/s || align="right"| 218 MHz |
|- | |- | ||
− | | | + | | Keccak || [http://www.strombergson.com/files/Keccak_in_FPGAs.pdf J. Strömbergson] [[#Ref009|[9]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Core (round function, state register) only || Xilinx Spartan 3A || align="right"| 3393 slices || align="right"| 4800 Mbit/s || align="right"| 85 MHz |
|- | |- | ||
− | | | + | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Core (round function, state register) & IO buffer || Xilinx Virtex 5 || align="right"| 1412 slices || align="right"| 6900 Mbit/s || align="right"| 122 MHz |
|- | |- | ||
− | | | + | | Keccak(-224) || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1117 slices || align="right"| 5915 Mbit/s || align="right"| 189 MHz |
+ | |||
|- | |- | ||
− | | | + | | Keccak(-256) || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1272 slices || align="right"| 12817 Mbit/s || align="right"| 282.7 MHz |
|- | |- | ||
− | | | + | | Keccak(-256) || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Altera Stratix III || align="right"| 4213 ALUTs || align="right"| 12393 Mbit/s || align="right"| 273.4 MHz |
+ | |||
|- | |- | ||
− | | | + | | Keccak(-256) || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1117 slices || align="right"| 6263 Mbit/s || align="right"| 189 MHz |
|- | |- | ||
− | | | + | | Keccak(-256) || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1433 slices || align="right"| 8397 Mbit/s || align="right"| 205 MHz |
|- | |- | ||
− | | | + | | Keccak(-384) || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1117 slices || align="right"| 8190 Mbit/s || align="right"| 189 MHz |
|- | |- | ||
− | | | + | | Keccak(-512) || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1117 slices || align="right"| 8518 Mbit/s || align="right"| 189 MHz |
+ | |||
|- | |- | ||
− | | | + | | Keccak(-512) || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1257 slices || align="right"| 6845 Mbit/s || align="right"| 285.2 MHz |
|- | |- | ||
− | | | + | | Keccak(-512) || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Altera Stratix III || align="right"| 3979 ALUTs || align="right"| 7310 Mbit/s || align="right"| 304.6 MHz |
+ | |||
|- | |- | ||
− | | | + | | Keccak || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || One Keccak-f round per cycle || Xilinx Spartan 3 || align="right"| 2024 slices || align="right"| 3460 Mbit/s || align="right"| 81.4 MHz |
|- | |- | ||
− | | | + | | Keccak || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || One Keccak-f round per cycle || Xilinx Virtex-II || align="right"| 2024 slices || align="right"| 5810 Mbit/s || align="right"| 136.6 MHz |
|- | |- | ||
− | | | + | | Keccak || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || One Keccak-f round per cycle || Xilinx Virtex 4 || align="right"| 2024 slices || align="right"| 6070 Mbit/s || align="right"| 142.9 MHz |
+ | |||
|- | |- | ||
− | | | + | | Skein-256-h || [http://www.skein-hash.info/sites/default/files/skein_fpga.pdf Men Long] [[#Ref011|[11]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || UBI component || Xilinx Virtex 5 || align="right"| 1001 slices || align="right"| 408.7 Mbit/s || align="right"| 114.9 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://eprint.iacr.org/2009/159.pdf Stefan Tillich] [[#Ref012|[12]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || Xilinx Virtex 5 || align="right"| 937 slices || align="right"| 1751 Mbit/s || align="right"| 68.4 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://eprint.iacr.org/2009/159.pdf Stefan Tillich] [[#Ref012|[12]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || Xilinx Spartan 3 || align="right"| 2421 slices || align="right"| 669 Mbit/s || align="right"| 26.14 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] / [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 854 slices || align="right"| 1482 Mbit/s || align="right"| 115 MHz |
+ | |||
|- | |- | ||
− | | | + | | Skein-512-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 4 Threefish rounds unrolled || Xilinx Virtex 5 || align="right"| 1621 slices || align="right"| 3178 Mbit/s || align="right"| 118.0 MHz |
|- | |- | ||
− | | | + | | Skein-512-256 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 4 Threefish rounds unrolled || Altera Stratix III || align="right"| 4645 ALUTs || align="right"| 2503 Mbit/s || align="right"| 92.9 MHz |
+ | |||
|- | |- | ||
− | | | + | | Skein-256-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS website] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 854 slices || align="right"| 1402 Mbit/s || align="right"| 115 MHz |
|- | |- | ||
− | | | + | | Skein-512-h || [http://www.skein-hash.info/sites/default/files/skein_fpga.pdf Men Long] [[#Ref011|[11]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || UBI component || Xilinx Virtex 5 || align="right"| 1877 slices || align="right"| 817.4 Mbit/s || align="right"| 114.9 MHz |
|- | |- | ||
− | | | + | | Skein-512-512 || [http://eprint.iacr.org/2009/159.pdf Stefan Tillich] [[#Ref012|[12]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || Xilinx Virtex 5 || align="right"| 1632 slices || align="right"| 3535 Mbit/s || align="right"| 69.04 MHz |
|- | |- | ||
− | | | + | | Skein-512-512 || [http://eprint.iacr.org/2009/159.pdf Stefan Tillich] [[#Ref012|[12]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || Xilinx Spartan 3 || align="right"| 4273 slices || align="right"| 1365 Mbit/s || align="right"| 26.66 MHz |
|- | |- | ||
− | | | + | | Skein-512 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] / [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || Xilinx Virtex 5 || align="right"| 1786 slices || align="right"| 1945 Mbit/s || align="right"| 83.65 MHz |
+ | |||
|- | |- | ||
− | | | + | | Skein-512-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 4 Threefish rounds unrolled || Xilinx Virtex 5 || align="right"| 1716 slices || align="right"| 3209 Mbit/s || align="right"| 119.1 MHz |
|- | |- | ||
− | | | + | | Skein-512-512 || [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] / [mailto:kgaj@gmu.edu On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 4 Threefish rounds unrolled || Altera Stratix III || align="right"| 4794 ALUTs || align="right"| 2434 Mbit/s || align="right"| 90.3 MHz |
+ | |||
+ | |} | ||
+ | |||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | === Low-Area Implementations (FPGA) === | ||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="120"| Hash Function Name !! width="150"| Reference / HDL !! width="100"| Impl. Scope !! width="200"| Implementation Details !! width="100"| Technology !! width="80"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Spartan-3 || align="right"| 124 slices || align="right"| 82 Mbit/s || align="right"| 190.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Virtex-4 || align="right"| 124 slices || align="right"| 154 Mbit/s || align="right"| 357.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Virtex-5 || align="right"| 56 slices || align="right"| 161 Mbit/s || align="right"| 372.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Altera Cyclone III || align="right"| 285 LEs || align="right"| 83 Mbit/s || align="right"| 192.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 1 G function unit || Xilinx Virtex-II Pro || align="right"| 958 slices || align="right"| 265 Mbit/s || align="right"| 59.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 1 G function unit || Xilinx Virtex 4 || align="right"| 960 slices || align="right"| 307 Mbit/s || align="right"| 68.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 1 G function unit || Xilinx Virtex 5 || align="right"| 390 slices || align="right"| 411 Mbit/s || align="right"| 91.0 MHz | ||
+ | |||
|- | |- | ||
− | | | + | | BLAKE-256 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Virtex 6 || align="right"| 117 slices || align="right"| 105 Mbit/s || align="right"| 274.0 MHz |
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Spartan-3 || align="right"| 229 slices || align="right"| 121 Mbit/s || align="right"| 158.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Virtex-4 || align="right"| 230 slices || align="right"| 192 Mbit/s || align="right"| 250.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Virtex-5 || align="right"| 108 slices || align="right"| 275 Mbit/s || align="right"| 358.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://eprint.iacr.org/2010/173.pdf Beuchat et al.] [[#Ref013|[13]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Altera Cyclone III || align="right"| 542 LEs || align="right"| 108 Mbit/s || align="right"| 140.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 1 G function unit || Xilinx Virtex-II Pro || align="right"| 1802 slices || align="right"| 285 Mbit/s || align="right"| 36.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 1 G function unit || Xilinx Virtex 4 || align="right"| 1856 slices || align="right"| 333 Mbit/s || align="right"| 42.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 1 G function unit || Xilinx Virtex 5 || align="right"| 939 slices || align="right"| 466 Mbit/s || align="right"| 59.0 MHz | ||
+ | |||
|- | |- | ||
− | | | + | | BLAKE-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Virtex 6 || align="right"| 192 slices || align="right"| 183 Mbit/s || align="right"| 240.0 MHz |
|- | |- | ||
− | | | + | | BLAKE-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rescheduled G function || Xilinx Spartan 6 || align="right"| 230 slices || align="right"| 103 Mbit/s || align="right"| 135.0 MHz |
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath, P & Q permutation in parallel || Xilinx Spartan 3 || align="right"| 2486 slices || align="right"| 404 Mbit/s || align="right"| 63.2 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://eprint.iacr.org/2009/206.pdf Jungk et al.] [[#Ref006|[6]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath, P & Q permutation in parallel || Xilinx Virtex 2 Pro || align="right"| 2754 slices || align="right"| 512 Mbit/s || align="right"| 81.5 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Shared P & Q permutation, S-Box based on composite field arithmetic || Xilinx Spartan 3 || align="right"| 1276 slices || align="right"| 192 Mbit/s || align="right"| 60 MHz | ||
+ | |||
|- | |- | ||
− | | | + | | Grøstl-256 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Interleaved P & Q permutations || Xilinx Virtex 6 || align="right"| 260 slices || align="right"| 815 Mbit/s || align="right"| 280.0 MHz |
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-384/512 || [http://eprint.iacr.org/2010/260.pdf Jungk and Reith] [[#Ref022|[22]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Shared P & Q permutation, S-Box based on composite field arithmetic || Xilinx Spartan 3 || align="right"| 2110 slices || align="right"| 144 Mbit/s || align="right"| 63 MHz | ||
+ | |||
|- | |- | ||
− | | | + | | Grøstl-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Interleaved P & Q permutations || Xilinx Virtex 6 || align="right"| 260 slices || align="right"| 640 Mbit/s || align="right"| 280.0 MHz |
|- | |- | ||
− | | | + | | Grøstl-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Interleaved P & Q permutations || Xilinx Spartan 6 || align="right"| 343 slices || align="right"| 548 Mbit/s || align="right"| 240.0 MHz |
+ | |||
|- | |- | ||
− | | | + | | JH-256 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath & distributed RAMs || Xilinx Virtex 6 || align="right"| 240 slices || align="right"| 214 Mbit/s || align="right"| 288.0 MHz |
+ | |||
|- | |- | ||
− | | | + | | JH-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath & distributed RAMs || Xilinx Virtex 6 || align="right"| 240 slices || align="right"| 214 Mbit/s || align="right"| 288.0 MHz |
|- | |- | ||
− | | | + | | JH-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath & distributed RAMs || Xilinx Spartan 6 || align="right"| 260 slices || align="right"| 84 Mbit/s || align="right"| 113.0 MHz |
+ | |||
|- | |- | ||
− | | | + | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Implementation_with_External_Memory|Using external memory]] || Small core using system memory || Altera Stratix III || align="right"| 855 ALUTs || align="right"| 96.8 Mbit/s || align="right"| 366 MHz |
|- | |- | ||
− | | Skein-512 || [http:// | + | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Implementation_with_External_Memory|Using external memory]] || Small core using system memory || Altera Cyclone III || align="right"| 1559 LEs || align="right"| 47.8 Mbit/s || align="right"| 181 MHz |
+ | |- | ||
+ | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Implementation_with_External_Memory|Using external memory]] || Small core using system memory || Xilinx Virtex 5 || align="right"| 444 slices || align="right"| 70.1 Mbit/s || align="right"| 265 MHz | ||
+ | |||
+ | |- | ||
+ | | Keccak(-256) || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rho transformation performed with a barrel rotator || Xilinx Virtex 6 || align="right"| 144 slices || align="right"| 128 Mbit/s || align="right"| 250.0 MHz | ||
+ | |||
+ | |- | ||
+ | | Keccak(-512) || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rho transformation performed with a barrel rotator || Xilinx Virtex 6 || align="right"| 144 slices || align="right"| 68 Mbit/s || align="right"| 250.0 MHz | ||
+ | |- | ||
+ | | Keccak(-512) || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Rho transformation performed with a barrel rotator || Xilinx Spartan 6 || align="right"| 193 slices || align="right"| 45 Mbit/s || align="right"| 166.0 MHz | ||
+ | |||
+ | |- | ||
+ | | Skein-256-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || One round of Threefish iterated || Altera Stratix III || align="right"| 1385 ALUTs || align="right"| 573.9 Mbit/s || align="right"| 161.42 MHz | ||
+ | |||
+ | |- | ||
+ | | Skein-512-256 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || One round of Threefish iterated || Xilinx Virtex 6 || align="right"| 240 slices || align="right"| 179 Mbit/s || align="right"| 160.0 MHz | ||
+ | |||
+ | |- | ||
+ | | Skein-512-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || One round of Threefish iterated || Xilinx Virtex 6 || align="right"| 240 slices || align="right"| 179 Mbit/s || align="right"| 160.0 MHz | ||
+ | |- | ||
+ | | Skein-512-512 || [http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf Kerckhof et al.] [[#Ref042|[42]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || One round of Threefish iterated || Xilinx Spartan 6 || align="right"| 292 slices || align="right"| 102 Mbit/s || align="right"| 91.0 MHz | ||
+ | |||
|} | |} | ||
<br><br> | <br><br> | ||
− | == | + | === High-Speed Implementations (ASIC) === |
+ | |||
+ | <br /> | ||
+ | |||
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
|- style="background:#efefef;" | |- style="background:#efefef;" | ||
! width="120"| Hash Function Name !! width="150"| Reference / HDL !! width="100"| Impl. Scope !! width="200"| Implementation Details !! width="100"| Technology !! width="80"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ! width="120"| Hash Function Name !! width="150"| Reference / HDL !! width="100"| Impl. Scope !! width="200"| Implementation Details !! width="100"| Technology !! width="80"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units || UMC 0.18 µm || align="right"| 58.30 kGates || align="right"| 3782 Mbit/s || align="right"| 114 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 4 G function units || UMC 0.18 µm || align="right"| 41.31 kGates || align="right"| 2966 Mbit/s || align="right"| 170 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units and I/O registers || STM 90 nm || align="right"| 53 kGates || align="right"| 3196 Mbit/s(*) || align="right"| 96.15 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 4 G function units with CSAs || UMC 0.18 µm || align="right"| 45.64 kGates || align="right"| 2836 Mbit/s || align="right"| 170.64 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Four parallel G functions modules || UMC 90 nm || align="right"| 47.5 kGates || align="right"| 6966 Mbit/s || align="right"| 400 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || UMC 0.13 µm || align="right"| 43.52 kGates || align="right"| 3318 Mbit/s || align="right"| 200 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || STM 90 nm || align="right"| 37 kGates || align="right"| 4763 Mbit/s || align="right"| 286.5 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 8 G function units || UMC 0.18 µm || align="right"| 79 kGates || align="right"| 4548 Mbit/s || align="right"| 137 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 4 G function units || UMC 0.18 µm || align="right"| 48 kGates || align="right"| 4176 Mbit/s || align="right"| 240 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 8 G function units || UMC 0.13 µm || align="right"| 67 kGates || align="right"| 6689 Mbit/s || align="right"| 201 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 4 G function units || UMC 0.13 µm || align="right"| 43 kGates || align="right"| 5748 Mbit/s || align="right"| 330 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 8 G function units || UMC 90 nm || align="right"| 65 kGates || align="right"| 12499 Mbit/s || align="right"| 376 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 4 G function units || UMC 90 nm || align="right"| 38 kGates || align="right"| 10816 Mbit/s || align="right"| 621 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 8 G function units || UMC 0.18 µm || align="right"| 132.47 kGates || align="right"| 5171 Mbit/s || align="right"| 87 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with 4 G function units || UMC 0.18 µm || align="right"| 82.73 kGates || align="right"| 4209 Mbit/s || align="right"| 136 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 8 G function units || UMC 0.18 µm || align="right"| 147 kGates || align="right"| 6314 Mbit/s || align="right"| 106 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 4 G function units || UMC 0.18 µm || align="right"| 98 kGates || align="right"| 6293 Mbit/s || align="right"| 204 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 8 G function units || UMC 0.13 µm || align="right"| 139 kGates || align="right"| 9452 Mbit/s || align="right"| 158 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 4 G function units || UMC 0.13 µm || align="right"| 92 kGates || align="right"| 8982 Mbit/s || align="right"| 291 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 8 G function units || UMC 90 nm || align="right"| 128 kGates || align="right"| 17777 Mbit/s || align="right"| 298 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Compression function with 4 G function units || UMC 90 nm || align="right"| 79 kGates || align="right"| 16434 Mbit/s || align="right"| 532 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || One shared permutation for P & Q, one pipeline stage || UMC 0.18 µm || align="right"| 58.40 kGates || align="right"| 6290 Mbit/s || align="right"| 270.27 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P and Q permutation interleaved with one pipeline stage, S-box as LUT || UMC 90 nm || align="right"| 135 kGates || align="right"| 16254 Mbit/s || align="right"| 667 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || UMC 0.13 µm || align="right"| 110.11 kGates || align="right"| 9606 Mbit/s || align="right"| 188 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || STM 90 nm || align="right"| 139.1 kGates || align="right"| 17297 Mbit/s || align="right"| 337.8 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] [[#Ref039|[39]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || STM 90 nm || align="right"| 120.8 kGates || align="right"| 16275 Mbit/s || align="right"| 349.7 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-384/512 || [http://www.groestl.info/Groestl.pdf Submission doc.] [[#Ref007|[7]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || P & Q permutation in parallel || UMC 0.18 µm || align="right"| 341 kGates || align="right"| 6225 Mbit/s || align="right"| 85.1 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 320 S-boxes, one round of R<sub>8</sub> per cycle || UMC 0.18 µm || align="right"| 58.83 kGates || align="right"| 4219 Mbit/s || align="right"| 380.22 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || S-boxes as LUTs, stored constants || UMC 90 nm || align="right"| 80 kGates || align="right"| 9134 Mbit/s || align="right"| 760 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || UMC 0.13 µm || align="right"| 62.42 kGates || align="right"| 4334 Mbit/s || align="right"| 391 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || STM 90 nm || align="right"| 54.6 kGates || align="right"| 8471 Mbit/s || align="right"| 763.4 MHz | ||
+ | |||
|- | |- | ||
− | | | + | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Core (round function, state register) & IO buffer || ST 0.13 µm || align="right"| 48 kGates || align="right"| 29900 Mbit/s || align="right"| 526 MHz |
|- | |- | ||
− | | | + | | Keccak || [http://keccak.noekeon.org/Keccak-specifications.pdf Submission doc.] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Core (round function, state register) only || ST 0.13 µm || align="right"| 40 kGates || align="right"| 15000 Mbit/s || align="right"| 500 MHz |
|- | |- | ||
− | | | + | | Keccak(-256) || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || One instance of Keccak-f round || UMC 0.18 µm || align="right"| 56.32 kGates || align="right"| 21229 Mbit/s || align="right"| 487.80 MHz |
|- | |- | ||
− | | | + | | Keccak(-256) || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || One round per cycle || UMC 90 nm || align="right"| 50 kGates || align="right"| 43011 Mbit/s || align="right"| 949 MHz |
|- | |- | ||
− | | | + | | Keccak(-256) || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || UMC 0.13 µm || align="right"| 47.43 kGates || align="right"| 15457 Mbit/s || align="right"| 377 MHz |
|- | |- | ||
− | | | + | | Keccak || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf Akin et al.] [[#Ref034|[34]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || One Keccak-f round per cycle || Synopsys 90 nm || align="right"| 10.5 kGates || align="right"| 19320 Mbit/s || align="right"| 454.5 MHz |
|- | |- | ||
− | | | + | | Keccak(-256) || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || STM 90 nm || align="right"| 50.7 kGates || align="right"| 33333 Mbit/s || align="right"| 781.3 MHz |
+ | |||
|- | |- | ||
− | | | + | | Keccak(-256) || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] [[#Ref039|[39]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || STM 90 nm || align="right"| 55.9 kGates || align="right"| 43986 Mbit/s || align="right"| 1030.9 MHz |
+ | |||
|- | |- | ||
− | | | + | | Skein-256-256 || [http://eprint.iacr.org/2009/159.pdf Stefan Tillich] [[#Ref012|[12]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || UMC 0.18 µm || align="right"| 53.87 kGates || align="right"| 1762 Mbit/s || align="right"| 68.8 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || All 72 Threefish rounds unrolled || STM 90 nm || align="right"| 369 kGates || align="right"| 3126 Mbit/s(*) || align="right"| 12.21 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || UMC 0.18 µm || align="right"| 58.61 kGates || align="right"| 1882 Mbit/s || align="right"| 73.52 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] / [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Four unrolled Threefish rounds || UMC 90 nm || align="right"| 50 kGates || align="right"| 3558 Mbit/s || align="right"| 264 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] / [http://rijndael.ece.vt.edu/sha3/ VT webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || UMC 0.13 µm || align="right"| 40.9 kGates || align="right"| 1941 Mbit/s || align="right"| 159 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || STM 90 nm || align="right"| 43.1 kGates || align="right"| 3295 Mbit/s || align="right"| 270.3 MHz |
|- | |- | ||
− | | | + | | Skein-512-512 || [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] / [mailto:mfeldhof@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || UMC 0.18 µm || align="right"| 102.04 kGates || align="right"| 2502 Mbit/s || align="right"| 48.87 MHz |
|- | |- | ||
− | | Grøstl-224/256 || [http://eprint.iacr.org/2009/ | + | | Skein-512 || [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/WALKER_skein-intel-hwd.pdf Walker et al.] [[#Ref036|[36]]] / N/A] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || Intel 32 nm || align="right"| 57.93 kGates || align="right"| 32320 Mbit/s || align="right"| 631.31 MHz |
+ | |} | ||
+ | |||
+ | (*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s. | ||
+ | |||
+ | <br><br> | ||
+ | |||
+ | === Low-Area Implementations (ASIC) === | ||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="120"| Hash Function Name !! width="150"| Reference / HDL !! width="100"| Impl. Scope !! width="200"| Implementation Details !! width="100"| Technology !! width="80"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || One G function in 11 cycles || AMS 0.35 µm || align="right"| 25.57 kGates || align="right"| 11 Mbit/s || align="right"| 31.25 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with a single G function unit || UMC 0.18 µm || align="right"| 10.54 kGates || align="right"| 180.7 Mbit/s || align="right"| 40 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with a half G function unit || UMC 0.18 µm || align="right"| 9.89 kGates || align="right"| 90.7 Mbit/s || align="right"| 40 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 1 adder and 4-word latch array || UMC 0.18 µm || align="right"| 13.56 kGates || align="right"| 96.4 Mbit/s || align="right"| 215 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || [http://131002.net/data/papers/HAMP10.pdf Henzen et al.] [[#Ref040|[40]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_with_External_Memory|Using external memory]] || 1 adder and 4-word latch array || UMC 0.18 µm || align="right"| 8.60 kGates || align="right"| 44.3 Mbit/s || align="right"| 100 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with a single G function unit || UMC 0.18 µm || align="right"| 20.61 kGates || align="right"| 158.4 Mbit/s || align="right"| 20 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || [http://131002.net/blake/blake.pdf Submission doc.] [[#Ref001|[1]]] / [http://131002.net/blake/ Submission webpage] || [[#Implementation_of_Core_Functionality|Core functionality]] || Compression function with a half G function unit || UMC 0.18 µm || align="right"| 19.46 kGates || align="right"| 79.6 Mbit/s || align="right"| 20 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath, P & Q permutation shared || AMS 0.35 µm || align="right"| 14.62 kGates || align="right"| 145.9 Mbit/s || align="right"| 55.87 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || [http://www.groestl.info Grøstl website] [[#Ref019|[19]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath, P & Q permutation shared || UMC 0.18 µm || align="right"| 17 kGates || align="right"| 645 Mbit/s || align="right"| 246.9 MHz | ||
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] [[#Ref039|[39]]] / [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html RCIS webpage] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || || STM 90 nm || align="right"| 34.8 kGates || align="right"| 2478 Mbit/s || align="right"| 101.6 MHz | ||
+ | |||
|- | |- | ||
− | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated | + | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Implementation_with_External_Memory|Using external memory]] || Small core using system memory || ST 0.13 µm || align="right"| 6.5 kGates || align="right"| 176.4 Mbit/s(*) || align="right"| 666.7 MHz |
|- | |- | ||
− | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated | + | | Keccak || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated spec. (v1.2)] [[#Ref008|[8]]] / [http://keccak.noekeon.org/ Submission webpage] || [[#Implementation_with_External_Memory|Using external memory]] || Small core using system memory, clock freq. limited to 200 MHz || ST 0.13 µm || align="right"| 5 kGates || align="right"| 52.9 Mbit/s(**) || align="right"| 200 MHz |
+ | |||
|- | |- | ||
− | | | + | | Skein-256-256 || [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] / N/A || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 64-bit datapath || AMS 0.35 µm || align="right"| 12.89 kGates || align="right"| 19.8 Mbit/s || align="right"| 80 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || One round of Threefish iterated || STM 90 nm || align="right"| 21 kGates || align="right"| 1018.8 Mbit/s(***) || align="right"| 286.53 MHz |
+ | |} | ||
+ | |||
+ | (*) Estimation for 64-bit memory interface: (1024 bits/permutation) * (666.7 * 10^6 cycles/s) / (3870 cycles/permutation) = 176.41 * 10^6 bits/s | ||
+ | <br /> | ||
+ | (**) Estimation for 64-bit memory interface: (1024 bits/permutation) * (200 * 10^6 cycles/s) / (3870 cycles/permutation) = 52.92 * 10^6 bits/s | ||
+ | <br /> | ||
+ | (***) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s | ||
+ | |||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | == Comparative Studies == | ||
+ | |||
+ | This section summarizes the reported results of publications which examined more than one round-three candidate in a similar setup. | ||
+ | |||
+ | === BLAKE, Skein === | ||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] || N/A || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]] || [[#Implementation_of_Core_Functionality|Core functionality]] || Altera Stratix III | ||
+ | |} | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || Compression function with 8 G function units and I/O registers || align="right"| 5435 ALUTs || align="right"| 1562 Mbit/s || align="right"| 46.97 MHz | ||
|- | |- | ||
− | | | + | | Skein-256-256 || All 72 Threefish rounds unrolled (device too small) || align="right"| N/A || align="right"| N/A || align="right"| N/A |
+ | |} | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] [[#Ref002|[2]]] || N/A || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]] || [[#Implementation_of_Core_Functionality|Core functionality]] || STM 90 nm | ||
+ | |} | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || Compression function with 8 G function units and I/O registers || align="right"| 53 kGates || align="right"| 3196 Mbit/s(*) || align="right"| 96.15 MHz | ||
|- | |- | ||
− | + | | Skein-256-256 || All 72 Threefish rounds unrolled || align="right"| 369 kGates || align="right"| 3126 Mbit/s(*) || align="right"| 12.21 MHz | |
− | |||
− | | Skein-256-256 || | ||
|} | |} | ||
− | <br><br> | + | (*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s. |
+ | |||
+ | <br /> | ||
+ | <br /> | ||
− | == | + | === BLAKE, Grøstl, Skein === |
− | + | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | |
+ | |- style="background:#efefef;" | ||
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] [[#Ref003|[3]]] || [http://www.rcis.aist.go.jp/special/SASEBO/SHA3-en.html RCIS webpage] || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Xilinx Virtex 5 | ||
+ | |} | ||
− | + | ||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || || align="right"| 1660 slices || align="right"| 1911 Mbit/s || align="right"| 115 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || || align="right"| 4057 slices || align="right"| 5171 Mbit/s || align="right"| 101 MHz | ||
+ | |- | ||
+ | | Skein-256 || || align="right"| 854 slices || align="right"| 1482 Mbit/s || align="right"| 115 MHz | ||
+ | |} | ||
+ | <br /> | ||
<br /> | <br /> | ||
+ | |||
+ | === All 5 Round-Three Candidates === | ||
+ | |||
+ | Reported results are post-synthesis. An interactive graphical comparison of various area-performance tradeoffs of this study can be found [http://www.iaik.tugraz.at/content/research/vlsi/sha3hw/ here]. | ||
+ | |||
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
|- style="background:#efefef;" | |- style="background:#efefef;" | ||
− | ! width=" | + | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology |
− | |- | + | |- |
− | + | | [http://eprint.iacr.org/2009/510.pdf Tillich et al.] [[#Ref014|[14]]] || [mailto:mfeldhof@iaik.tugraz.at On request] || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || UMC 0.18 µm | |
− | + | |} | |
− | | | ||
− | | | ||
− | |||
− | |||
− | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || Compression function with 4 G function units with CSAs || align="right"| 45.64 kGates || align="right"| 2836 Mbit/s || align="right"| 170.64 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || One shared permutation for P & Q, one pipeline stage || align="right"| 58.40 kGates || align="right"| 6290 Mbit/s || align="right"| 270.27 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || 320 S-boxes, one round of R<sub>8</sub> per cycle || align="right"| 58.83 kGates || align="right"| 4219 Mbit/s || align="right"| 380.22 MHz | ||
|- | |- | ||
− | | | + | | Keccak(-256) || One instance of Keccak-f round || align="right"| 56.32 kGates || align="right"| 21229 Mbit/s || align="right"| 487.80 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || 8 Threefish rounds unrolled || align="right"| 58.61 kGates || align="right"| 1882 Mbit/s || align="right"| 73.52 MHz |
|- | |- | ||
− | | | + | | Skein-512-512 || 8 Threefish rounds unrolled || align="right"| 102.04 kGates || align="right"| 2502 Mbit/s || align="right"| 48.87 MHz |
+ | |} | ||
− | + | <br /> | |
− | + | <br /> | |
− | + | === BLAKE, Grøstl, Skein === | |
− | |||
− | |- | + | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" |
− | | | + | |- style="background:#efefef;" |
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://eprint.iacr.org/2009/349.pdf Tillich et al.] [[#Ref018|[18]]] || N/A || [[#Low-Area_Implementations_(ASIC)|Low-area ASIC]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || AMS 0.35 µm | ||
+ | |} | ||
− | |||
− | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || One G function in 11 cycles || align="right"| 25.57 kGates || align="right"| 11 Mbit/s || align="right"| 31.25 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-224/256 || 64-bit datapath, P & Q permutation shared || align="right"| 14.62 kGates || align="right"| 145.9 Mbit/s || align="right"| 55.87 MHz | ||
|- | |- | ||
− | | | + | | Skein-256-256 || 64-bit datapath || align="right"| 12.89 kGates || align="right"| 19.8 Mbit/s || align="right"| 80 MHz |
− | | | + | |} |
− | |||
− | + | <br /> | |
− | + | <br /> | |
− | + | === All 5 Round-Three Candidates === | |
− | |||
− | + | Reported results of this study are post-P&R performances of designs targeting high throughput. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |- | + | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" |
− | | | + | |- style="background:#efefef;" |
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://www.springerlink.com/content/g0115v3272156r06/ Henzen et al.] [[#Ref029|[29]]] || [http://www.iis.ee.ethz.ch/~sha3/ ETH webpage] || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || UMC 90 nm | ||
+ | |} | ||
− | |||
− | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || Four parallel G functions modules || align="right"| 47.5 kGates || align="right"| 6966 Mbit/s || align="right"| 400 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || P and Q permutation interleaved with one pipeline stage, S-box as LUT || align="right"| 135 kGates || align="right"| 16254 Mbit/s || align="right"| 667 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || S-boxes as LUTs, stored constants || align="right"| 80 kGates || align="right"| 9134 Mbit/s || align="right"| 760 MHz | ||
|- | |- | ||
− | | Keccak | + | | Keccak(-256) || One round per cycle || align="right"| 50 kGates || align="right"| 43011 Mbit/s || align="right"| 949 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || Four unrolled Threefish rounds || align="right"| 50 kGates || align="right"| 3558 Mbit/s || align="right"| 264 MHz |
+ | |} | ||
− | + | <br /> | |
− | + | <br /> | |
− | + | === All 5 Round-Three Candidates === | |
− | |||
− | |||
− | |||
− | + | Designs optimized towards throughput to area ratio. The cited results are those for the Xilinx Virtex 5 and Altera Stratix III platforms (both for the 256-bit and the 512-bit version of the candidates). Results marked with N/A did not fit into the largest device of the device family. For a full listing of all ATHENa results refer to the [http://cryptography.gmu.edu/athena/ ATHENa webpage]. | |
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |- style="background:# | + | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" |
− | | | + | |- style="background:#efefef;" |
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] || [mailto:kgaj@gmu.edu On request] || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Xilinx Virtex 5 | ||
+ | |} | ||
− | |||
− | |||
− | |- style="background:# | + | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" |
− | | | + | |- style="background:#efefef;" |
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || 4 G function units per iteration || align="right"| 1523 slices || align="right"| 2245 Mbit/s || align="right"| 128.9 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || 4 G function units per iteration || align="right"| 3064 slices || align="right"| 3080 Mbit/s || align="right"| 99.7 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || P & Q permutations interleaved || align="right"| 1597 slices || align="right"| 7885 Mbit/s || align="right"| 323.4 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-512 || P & Q permutations interleaved || align="right"| 3138 slices || align="right"| 10314 Mbit/s || align="right"| 292.1 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || || align="right"| 1018 slices || align="right"| 4578 Mbit/s || align="right"| 380.8 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-512 || || align="right"| 1104 slices || align="right"| 4742 Mbit/s || align="right"| 394.5 MHz | ||
|- | |- | ||
− | | | + | | Keccak(-256) || || align="right"| 1272 slices || align="right"| 12817 Mbit/s || align="right"| 282.7 MHz |
|- | |- | ||
− | | Skein- | + | | Keccak(-512) || || align="right"| 1257 slices || align="right"| 6845 Mbit/s || align="right"| 285.2 MHz |
+ | |- | ||
+ | | Skein-512-256 || 4 Threefish rounds unrolled || align="right"| 1621 slices || align="right"| 3178 Mbit/s || align="right"| 118.0 MHz | ||
+ | |- | ||
+ | | Skein-512-512 || 4 Threefish rounds unrolled || align="right"| 1716 slices || align="right"| 3209 Mbit/s || align="right"| 119.1 MHz | ||
− | | | + | |} |
− | |||
− | |||
− | |||
+ | ---- | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://eprint.iacr.org/2010/445.pdf Homsirikamol et al.] [[#Ref030|[30]]] || N/A || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Altera Stratix III | ||
|} | |} | ||
− | |||
− | |||
− | |||
− | |||
− | |||
{| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
|- style="background:#efefef;" | |- style="background:#efefef;" | ||
− | ! width=" | + | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency |
+ | |||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || 4 G function units per iteration || align="right"| 3635 ALUTs || align="right"| 2072 Mbit/s || align="right"| 119.0 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || 4 G function units per iteration || align="right"| 7086 ALUTs || align="right"| 2766 Mbit/s || align="right"| 89.5 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || P & Q permutations interleaved || align="right"| 6350 ALUTs || align="right"| 5380 Mbit/s || align="right"| 220.7 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-512 || P & Q permutations interleaved || align="right"| 12355 ALUTs || align="right"| 7142 Mbit/s || align="right"| 202.3 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || || align="right"| 3525 ALUTs || align="right"| 4661 Mbit/s || align="right"| 387.8 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-512 || || align="right"| 3709 ALUTs || align="right"| 4696 Mbit/s || align="right"| 390.6 MHz | ||
|- | |- | ||
− | | | + | | Keccak(-256) || || align="right"| 4213 ALUTs || align="right"| 12393 Mbit/s || align="right"| 273.4 MHz |
|- | |- | ||
− | | | + | | Keccak(-512) || || align="right"| 3979 ALUTs || align="right"| 7310 Mbit/s || align="right"| 304.6 MHz |
|- | |- | ||
− | | | + | | Skein-512-256 || 4 Threefish rounds unrolled || align="right"| 4645 ALUTs || align="right"| 2503 Mbit/s || align="right"| 92.9 MHz |
|- | |- | ||
− | | | + | | Skein-512-512 || 4 Threefish rounds unrolled || align="right"| 4794 ALUTs || align="right"| 2434 Mbit/s || align="right"| 90.3 MHz |
+ | |||
+ | |} | ||
+ | |||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | === All 5 Round-Three Candidates === | ||
+ | |||
+ | Results are without wrapper for long messages. | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf Baldwin et al.] [[#Ref031|[31]]] || [http://www.ucc.ie/en/crypto/SHA-3Hardware/ UCC webpage] || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Xilinx Virtex 5 | ||
+ | |} | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || || align="right"| 1118 slices || align="right"| 835 Mbit/s || align="right"| 118.06 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-512 || || align="right"| 1718 slices || align="right"| 1137 Mbit/s || align="right"| 90.91 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || || align="right"| 2391 slices || align="right"| 3242 Mbit/s || align="right"| 101.32 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-512 || || align="right"| 4845 slices || align="right"| 3619 Mbit/s || align="right"| 123.4 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH || || align="right"| 1291 slices || align="right"| 1641 Mbit/s || align="right"| 250.13 MHz | ||
|- | |- | ||
− | | | + | | Keccak(-224) || || align="right"| 1117 slices || align="right"| 5915 Mbit/s || align="right"| 189 MHz |
|- | |- | ||
− | | | + | | Keccak(-256) || || align="right"| 1117 slices || align="right"| 6263 Mbit/s || align="right"| 189 MHz |
|- | |- | ||
− | | | + | | Keccak(-384) || || align="right"| 1117 slices || align="right"| 8190 Mbit/s || align="right"| 189 MHz |
|- | |- | ||
− | | | + | | Keccak(-512) || || align="right"| 1117 slices || align="right"| 8518 Mbit/s || align="right"| 189 MHz |
|- | |- | ||
− | | | + | | Skein-512 || || align="right"| 1786 slices || align="right"| 1945 Mbit/s || align="right"| 83.65 MHz |
+ | |} | ||
+ | |||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | === All 5 Round-Three Candidates === | ||
+ | |||
+ | Results include throughputs without interface overhead. | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] || [[#High-Speed_Implementations_(FPGA)|High-speed FPGA]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || Xilinx Virtex 5 | ||
+ | |} | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || || align="right"| 1660 slices || align="right"| 1911 Mbit/s || align="right"| 115 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || || align="right"| 2616 slices || align="right"| 7885 Mbit/s || align="right"| 154 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || || align="right"| 2661 slices || align="right"| 2231 Mbit/s || align="right"| 201 MHz | ||
|- | |- | ||
− | | Keccak | + | | Keccak(-256) || || align="right"| 1433 slices || align="right"| 8397 Mbit/s || align="right"| 205 MHz |
|- | |- | ||
− | | | + | | Skein-256-256 || || align="right"| 854 slices || align="right"| 1402 Mbit/s || align="right"| 115 MHz |
+ | |} | ||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | |||
+ | Same implementations as in [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf Matsuo et al.] [[#Ref033|[33]]] implemented on STM 90 nm technology. | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] [[#Ref037|[37]]] || [http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html RCIS webpage] || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || STM 90 nm | ||
+ | |} | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || || align="right"| 37 kGates || align="right"| 4763 Mbit/s || align="right"| 286.5 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || || align="right"| 139.1 kGates || align="right"| 17297 Mbit/s || align="right"| 337.8 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || || align="right"| 54.6 kGates || align="right"| 8471 Mbit/s || align="right"| 763.4 MHz | ||
|- | |- | ||
− | | | + | | Keccak(-256) || || align="right"| 50.7 kGates || align="right"| 33333 Mbit/s || align="right"| 781.3 MHz |
|- | |- | ||
− | + | | Skein-256-256 || || align="right"| 43.1 kGates || align="right"| 3295 Mbit/s || align="right"| 270.3 MHz | |
− | |||
− | |||
− | |||
− | | Skein-256-256 || | ||
− | |||
− | |||
|} | |} | ||
− | |||
<br /> | <br /> | ||
− | |||
<br /> | <br /> | ||
− | ( | + | |
+ | === All 5 Round-Three Candidates === | ||
+ | |||
+ | Results are post-P&R and include throughputs without interface overhead. | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="200"| Reference !! width="120"| HDL !! width="120"| Category !! width="100"| Impl. Scope !! width="120"| Technology | ||
+ | |- | ||
+ | | [http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf Guo et al.] [[#Ref035|[35]]] || [http://rijndael.ece.vt.edu/sha3/ VT webpage] || [[#High-Speed_Implementations_(ASIC)|High-speed ASIC]] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || UMC 0.13 µm | ||
+ | |} | ||
+ | |||
+ | |||
+ | {| border="1" cellpadding="4" cellspacing="0" align="center" class="wikitable" | ||
+ | |- style="background:#efefef;" | ||
+ | ! width="140"| Hash Function Name !! width="270"| Impl. Details !! width="90"| Size !! width="80"| Throughput !! width="80"| Clock Frequency | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | BLAKE-256 || || align="right"| 43.52 kGates || align="right"| 3318 Mbit/s || align="right"| 200 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | Grøstl-256 || || align="right"| 110.11 kGates || align="right"| 9606 Mbit/s || align="right"| 188 MHz | ||
+ | |- style="background:#ffdf9f;" | ||
+ | | JH-256 || || align="right"| 62.42 kGates || align="right"| 4334 Mbit/s || align="right"| 391 MHz | ||
+ | |- | ||
+ | | Keccak(-256) || || align="right"| 47.43 kGates || align="right"| 15457 Mbit/s || align="right"| 377 MHz | ||
+ | |- | ||
+ | | Skein-256-256 || || align="right"| 40.9 kGates || align="right"| 1941 Mbit/s || align="right"| 159 MHz | ||
+ | |} | ||
<br /> | <br /> | ||
Line 404: | Line 922: | ||
<div id="Ref014"> | <div id="Ref014"> | ||
− | [14] Stefan Tillich, Martin Feldhofer | + | [14] Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, Jörn-Marc Schmidt, and Alexander Szekely. High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein. IACR Eprint report 2009/510. Available online at http://eprint.iacr.org/2009/510.pdf. |
</div> | </div> | ||
Line 425: | Line 943: | ||
<div id="Ref019"> | <div id="Ref019"> | ||
[19] Grøstl website. http://www.groestl.info/. | [19] Grøstl website. http://www.groestl.info/. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref020"> | ||
+ | [20] Markus Bernet, Luca Henzen, Hubert Kaeslin, Norbert Felber, and Wolfgang Fichtner. Hardware Implementations of the SHA-3 Candidates Shabal and CubeHash. 52nd IEEE International Midwest Symposium on Circuits and Systems, 2009. Available online at http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref021"> | ||
+ | [21] Michel Kinsy and Richard Uhler. SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec. Available online at http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref022"> | ||
+ | [22] Bernhard Jungk and Steffen Reith. On FPGA-based implementations of Grøstl. IACR Eprint report 2010/260. Available online at http://eprint.iacr.org/2010/260.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref023"> | ||
+ | [23] Jérémie Detrey, Pierre Gaudry, and Karim Khalfallah. A Low-Area yet Performant FPGA Implementation of Shabal. IACR Eprint report 2010/292. Available online at http://eprint.iacr.org/2010/292.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref024"> | ||
+ | [24] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. A Compact FPGA Implementation of the SHA-3 Candidate ECHO. IACR Eprint report 2010/364. Available online at http://eprint.iacr.org/2010/364.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref025"> | ||
+ | [25] Wim Ramakers and Hans Narinx. Implementation and evaluation of SHA-3 candidates on FPGA. Extended abstract of Master Thesis "Implementatie en Evaluatie van SHA-3-Kandidaten op FPGA" (Dutch). Extended abstract available online at http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf. Full thesis available online at http://ehash.iaik.tugraz.at/uploads/6/62/Ramakers_Narinx2010ECHO-Hamsi-Luffa_Thesis_DUTCH.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref026"> | ||
+ | [26] Julien Francq and Céline Thuillet. Unfolding Method for Shabal on Virtex-5 FPGAs: Concrete Results. IACR Eprint report 2010/406. Available online at http://eprint.iacr.org/2010/406.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref027"> | ||
+ | [27] Shugo Mikami, Nagamasa Mizushima, Setsuko Nakamura, and Dai Watanabe. A Compact Hardware Implementation of SHA-3 Candidate Luffa (version 20101105). Available online at http://www.sdl.hitachi.co.jp/crypto/luffa/ACompactHardwareImplementationOfSHA-3CandidateLuffa_20101105.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref028"> | ||
+ | [28] Imed Mabrouk and Ryad Benadjila. ECHO webpage (hardware subpage). http://crypto.rd.francetelecom.com/ECHO/hard/. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref029"> | ||
+ | [29] Luca Henzen, Pietro Gendotti, Patrice Guillet, Enrico Pargaetzi, Martin Zoller, and Frank K. Gürkaynak. Developing a Hardware Evaluation Method for SHA-3 Candidates. 12th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), 2010. Available online at http://www.springerlink.com/content/g0115v3272156r06/. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref030"> | ||
+ | [30] Ekawat Homsirikamol, Marcin Rogawski, and Kris Gaj. Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs. IACR Eprint report 2010/445. Available online at http://eprint.iacr.org/2010/445.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref031"> | ||
+ | [31] Brian Baldwin, Neil Hanley, Mark Hamilton, Liang Lu, Andrew Byrne, Maire O'Neill, and William P. Marnane. FPGA Implementations of the Round Two SHA-3 Candidates. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref032"> | ||
+ | [32] Mohamed El Hadedy, Martin Margala, Danilo Gligoroski, and Svein J. Knapskog. Resource-Efficient Implementation of Blue Midnight Wish-256 Hash Function on Xilinx FPGA Platform. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/El-Hadedy_SmallSizeFPGA-BMW256.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref033"> | ||
+ | [33] Shin'ichiro Matsuo, Miroslav Knezevic, Patrick Schaumont, Ingrid Verbauwhede, Akashi Satoh, Kazuo Sakiyama, and Kazuo Ota. How Can We Conduct "Fair and Consistent" Hardware Evaluation for SHA-3 Candidate? Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref034"> | ||
+ | [34] Abdulkadir Akin, Aydin Aysu, Onur Can Ulusel, and Erkay Savas. Efficient Hardware Implementations of High Throughput SHA-3 Candidates Keccak, Luffa and Blue Midnight Wish for Single- and Multi-Message Hashing. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref035"> | ||
+ | [35] Xu Guo, Sinan Huang, Leyla Nazhandali, and Patrick Schaumont. Fair and Comprehensive Performance Evaluation of 14 Second Round SHA-3 ASIC Implementations. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref036"> | ||
+ | [36] Jesse Walker, Farhana Sheikh, Sanu K. Mathew, and Ram Krishnamurthy. A Skein-512 Hardware Implementation. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/WALKER_skein-intel-hwd.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref037"> | ||
+ | [37] RCIS webpage. http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref038"> | ||
+ | [38] Akashi Satoh, Toshihiro Katashita, Takeshi Sugawara, Naofumi Homma, and Takafumi Aoki. Hardware Implementations of Hash Function Luffa. IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 2010. Available online at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5513102&tag=1. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref039"> | ||
+ | [39] RCIS webpage (Other ASIC Implementations). http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref040"> | ||
+ | [40] Luca Henzen, Jean-Philippe Aumasson, Willi Meier, and Raphael C.-W. Phan. VLSI Characterization of the Cryptographic Hash Function BLAKE. IEEE T VLSI, 2010. Available online at http://131002.net/data/papers/HAMP10.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref041"> | ||
+ | [41] Mohamed El Hadedy, Danilo Gligoroski, and Svein J. Knapskog. Single Core Implementation of Blue Midnight Wish Hash Function on VIRTEX 5 Platform. Available online at http://people.item.ntnu.no/~danilog/Hash/BMW-SecondRound/SmallSizeFPGA-BMWOct2010.pdf. | ||
+ | </div> | ||
+ | |||
+ | <div id="Ref042"> | ||
+ | [42] Stéphanie Kerckhof, François Durvaux, Nicolas Veyrat-Charvillon, Francesco Regazzoni, Guerric Meurice de Dormale, François-Xavier Standaert. Compact FPGA Implementations of the Five SHA-3 Finalists. Available online at http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf. | ||
</div> | </div> |
Latest revision as of 12:49, 23 May 2012
Contents
1 Call for Contributions
Implementers (both submitters and non-submitters): You have results that complement this site? Let us know at sha3zoo-hardware@iaik.tugraz.at If you are making your HDL code available, please also provide us with according information.
2 Important Information
This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST (final round 3). This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our call for contributions.
A list of hardware implementations of the round 1 candidates can be found here. A list of hardware implementations of the round 2 candidates is archived here. Please note that the pages for round 1 and 2 candidates are provided for reference and will not be updated.
The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct comparisons between different hardware implementations difficult. The more of these parameters agree, the more reasonable the comparison becomes.
The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology).
In order to facilitate the comparison of hardware modules with different implementation scopes, we classify them into three categories:
For suggestions regarding the structure of this site, let us know at sha3zoo-hardware@iaik.tugraz.at
2.1 Fully Autonomous Implementation
Such hardware implementations include the complete functionality of a SHA-3 candidate (or a specific version thereof). That means the input message can be loaded piecewise into the hardware module and it delivers the message digest as output. All hash calculations happen exclusively within the hardware module. If integrated in a system, the achievable throughput of a fully autonomous implementation depends on the speed of the hardware module itself and the speed of the (system dependent) data interface delivering the input message.
2.2 Implementation with External Memory
These implementations use external memory to hold intermediate values during the hashing of a message. The implemented hardware itself normally consists of the core logic functionality of the hash function, some registers for short-lived temporary values, and possible a memory controller for access to the external memory. Such implementations can load the input message either over a dedicated interface (similar to a fully autonomous implementation) or from the external memory. In order to reach the maximal throughput of the hardware module, the external memory must be sufficiently fast.
2.3 Implementation of Core Functionality
Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations.
3 Tweaks of Round Three Candidates over Round Two
The main tweaks for round three consist of the adaption of round numbers for some of the candidates. For implementations of round 2 variants (cf. round two results), we extrapolated to the performance of round 3 variants. Extrapolated results are marked in orange . If the tweaks for an algorithm are expected to be negligible for performance (e.g. just a change of constants), we include the results for the round 2 variant verbatim.
- BLAKE: The round three versions of BLAKE have been renamed to BLAKE-224, BLAKE-256, BLAKE-384, and BLAKE-512. The number of rounds has been increased from 10 to 14 for BLAKE-224 and BLAKE-256, and from 14 to 16 for BLAKE-384 and BLAKE-512. Thus, throughput for BLAKE-224 and BLAKE-256 is expected to decrease by a factor of 10/14 (reduction by about 28.5%), and for BLAKE-384 and BLAKE-512 by a factor of 14/16 (reduction by 12.5%).
- Grøstl: The shift distances for the Q permutation have been changed and the round constants for both P and Q permutation have been modified. The first is not expected to have an impact on hardware performance, whereas the latter is likely to increase overall hardware size and/or decrease throughput slightly.
- JH: The number of rounds has been increased from 35.5 to 42. Thus, throughput of JH is expected to decrease by a factor of 35.5/42 (reduction by about 15.5%).
- Keccak: The padding rule has been simplified and some parameters have been redefined. No significant impact on hardware performance is expected.
- Skein: A single 64-bit constant has been changed. No significant impact on hardware performance is expected.
4 Ongoing Hardware Benchmarking Efforts
To describe it in the words of the initiators and maintainers: "ATHENa: Automated Tool for Hardware EvaluatioN is a project started at George Mason University, aimed at fair, comprehensive, and automated evaluation of cryptographic cores developed using hardware description languages, such as VHDL and Verilog." More information about the project and the current results can be found on the ATHENa webpage. Note: As each hash module submitted to ATHENAa is implemented on several FPGA platforms, the SHA-3 zoo pages will not replicate all results produced by the ATHENa project on this webpage. Instead please refer directly to the ATHENa webpage.
5 Summary of All Results
This section includes four categories of implementations (high-speed, low-area, both for FPGA and ASIC) which include known published results. If the HDL sourcecode is available, a link is provided as well.
5.1 High-Speed Implementations (FPGA)
Important note: The size and functionality of slices varies between FPGA families. A direct comparison of the slice count of implementations on different FPGA families is therefore problematic.
Hash Function Name | Reference / HDL | Impl. Scope | Impl. Details | Technology | Size | Throughput | Clock Frequency |
---|---|---|---|---|---|---|---|
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 8 G function units | Xilinx Virtex-II Pro | 3091 slices | 1231 Mbit/s | 37.0 MHz |
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 8 G function units | Xilinx Virtex 4 | 3087 slices | 1596 Mbit/s | 48.0 MHz |
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 8 G function units | Xilinx Virtex 5 | 1694 slices | 2216 Mbit/s | 67.0 MHz |
BLAKE-256 | Namin and Hasan [2] / N/A | Core functionality | Compression function with 8 G function units and I/O registers | Altera Stratix III | 5435 ALUTs | 1562 Mbit/s | 46.97 MHz |
BLAKE-256 | Kobayashi et al. [3] / RCIS webpage | Fully autonomous | Xilinx Virtex 5 | 1660 slices | 1911 Mbit/s | 115 MHz | |
BLAKE-256 | Homsirikamol et al. [30] / On request | Fully autonomous | 4 G function units per iteration | Xilinx Virtex 5 | 1523 slices | 2245 Mbit/s | 128.9 MHz |
BLAKE-256 | Homsirikamol et al. [30] / On request | Fully autonomous | 4 G function units per iteration | Altera Stratix III | 3635 ALUTs | 2072 Mbit/s | 119.0 MHz |
BLAKE-256 | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 1118 slices | 835 Mbit/s | 118.06 MHz | |
BLAKE-256 | Matsuo et al. [33] / RCIS website | Fully autonomous | Xilinx Virtex 5 | 1660 slices | 1911 Mbit/s | 115 MHz | |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 8 G function units | Xilinx Virtex-II Pro | 11122 slices | 1030 Mbit/s | 17.0 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 8 G function units | Xilinx Virtex 4 | 11483 slices | 1494 Mbit/s | 25.0 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 8 G function units | Xilinx Virtex 5 | 4329 slices | 2090 Mbit/s | 35.0 MHz |
BLAKE-512 | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 1718 slices | 1137 Mbit/s | 90.91 MHz | |
BLAKE-512 | Homsirikamol et al. [30] / On request | Fully autonomous | 4 G function units per iteration | Xilinx Virtex 5 | 3064 slices | 3080 Mbit/s | 99.7 MHz |
BLAKE-512 | Homsirikamol et al. [30] / On request | Fully autonomous | 4 G function units per iteration | Altera Stratix III | 7086 ALUTs | 2766 Mbit/s | 89.5 MHz |
Grøstl-224/256 | Jungk et al. [6] / N/A | Fully autonomous | P & Q permutation in parallel | Xilinx Spartan 3 | 6136 slices | 4520 Mbit/s | 88.3 MHz |
Grøstl-224/256 | Submission doc. [7] / N/A | Fully autonomous | P & Q permutation in parallel | Xilinx Virtex 5 | 1722 slices | 10276 Mbit/s | 200.7 MHz |
Grøstl-224/256 | Baldwin et al. [4] / N/A | Core functionality | P & Q permutation in parallel, S-box in BRAM | Xilinx Spartan 3 | 4827 slices | 3660 Mbit/s | 71.53 MHz |
Grøstl-224/256 | Baldwin et al. [4] / N/A | Core functionality | P & Q permutation in parallel, S-box in BRAM | Xilinx Virtex 5 | 4516 slices | 7310 Mbit/s | 142.87 MHz |
Grøstl-256 | Kobayashi et al. [3] / RCIS webpage | Fully autonomous | Xilinx Virtex 5 | 4057 slices | 5171 Mbit/s | 101 MHz | |
Grøstl-256 | Homsirikamol et al. [30] / On request | Fully autonomous | P & Q permutations interleaved | Xilinx Virtex 5 | 1597 slices | 7885 Mbit/s | 323.4 MHz |
Grøstl-256 | Homsirikamol et al. [30] / On request | Fully autonomous | P & Q permutations interleaved | Altera Stratix III | 6350 ALUTs | 5380 Mbit/s | 220.7 MHz |
Grøstl-256 | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 2391 slices | 3242 Mbit/s | 101.32 MHz | |
Grøstl-256 | Matsuo et al. [33] / RCIS website | Fully autonomous | Xilinx Virtex 5 | 2616 slices | 7885 Mbit/s | 154 MHz | |
Grøstl-384/512 | Submission doc. [7] / N/A | Fully autonomous | P & Q permutation in parallel | Xilinx Spartan 3 | 20233 slices | 5901 Mbit/s | 80.7 MHz |
Grøstl-384/512 | Baldwin et al. [4] / N/A | Core functionality | P & Q permutation parallel, S-box in LUTs | Xilinx Spartan 3 | 17452 slices | 3180 Mbit/s | 79.61 MHz |
Grøstl-384/512 | Baldwin et al. [4] / N/A | Core functionality | P & Q permutation parallel, S-box in LUTs | Xilinx Virtex 5 | 19161 slices | 6090 Mbit/s | 83.33 MHz |
Grøstl-384/512 | Submission doc. [7] / N/A | Fully autonomous | P & Q permutation in parallel | Xilinx Virtex 5 | 5419 slices | 15395 Mbit/s | 210.5 MHz |
Grøstl-384/512 | Jungk and Reith [22] / N/A | Fully autonomous | Shared P & Q permutation | Xilinx Spartan 3 | 8308 slices | 3474 Mbit/s | 95 MHz |
Grøstl-512 | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 4845 slices | 3619 Mbit/s | 123.4 MHz | |
Grøstl-512 | Homsirikamol et al. [30] / On request | Fully autonomous | P & Q permutations interleaved | Xilinx Virtex 5 | 3138 slices | 10314 Mbit/s | 292.1 MHz |
Grøstl-512 | Homsirikamol et al. [30] / On request | Fully autonomous | P & Q permutations interleaved | Altera Stratix III | 12355 ALUTs | 7142 Mbit/s | 202.3 MHz |
JH-256 | Homsirikamol et al. [30] / On request | Fully autonomous | Xilinx Virtex 5 | 1018 slices | 4578 Mbit/s | 380.8 MHz | |
JH-256 | Homsirikamol et al. [30] / On request | Fully autonomous | Altera Stratix III | 3525 ALUTs | 4661 Mbit/s | 387.8 MHz | |
JH-256 | Matsuo et al. [33] / RCIS website | Fully autonomous | Xilinx Virtex 5 | 2661 slices | 2231 Mbit/s | 201 MHz | |
JH | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 1291 slices | 1641 Mbit/s | 250.13 MHz | |
JH-512 | Homsirikamol et al. [30] / On request | Fully autonomous | Xilinx Virtex 5 | 1104 slices | 4742 Mbit/s | 394.5 MHz | |
JH-512 | Homsirikamol et al. [30] / On request | Fully autonomous | Altera Stratix III | 3709 ALUTs | 4696 Mbit/s | 390.6 MHz | |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Fully autonomous | Core (round function, state register) & IO buffer | Altera Cyclone III | 5776 LEs | 7500 Mbit/s | 133 MHz |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Fully autonomous | Core (round function, state register) & IO buffer | Altera Stratix III | 4713 ALUTs | 12400 Mbit/s | 218 MHz |
Keccak | J. Strömbergson [9] / Submission webpage | Fully autonomous | Core (round function, state register) only | Xilinx Spartan 3A | 3393 slices | 4800 Mbit/s | 85 MHz |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Fully autonomous | Core (round function, state register) & IO buffer | Xilinx Virtex 5 | 1412 slices | 6900 Mbit/s | 122 MHz |
Keccak(-224) | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 1117 slices | 5915 Mbit/s | 189 MHz | |
Keccak(-256) | Homsirikamol et al. [30] / On request | Fully autonomous | Xilinx Virtex 5 | 1272 slices | 12817 Mbit/s | 282.7 MHz | |
Keccak(-256) | Homsirikamol et al. [30] / On request | Fully autonomous | Altera Stratix III | 4213 ALUTs | 12393 Mbit/s | 273.4 MHz | |
Keccak(-256) | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 1117 slices | 6263 Mbit/s | 189 MHz | |
Keccak(-256) | Matsuo et al. [33] / RCIS website | Fully autonomous | Xilinx Virtex 5 | 1433 slices | 8397 Mbit/s | 205 MHz | |
Keccak(-384) | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 1117 slices | 8190 Mbit/s | 189 MHz | |
Keccak(-512) | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 1117 slices | 8518 Mbit/s | 189 MHz | |
Keccak(-512) | Homsirikamol et al. [30] / On request | Fully autonomous | Xilinx Virtex 5 | 1257 slices | 6845 Mbit/s | 285.2 MHz | |
Keccak(-512) | Homsirikamol et al. [30] / On request | Fully autonomous | Altera Stratix III | 3979 ALUTs | 7310 Mbit/s | 304.6 MHz | |
Keccak | Akin et al. [34] / N/A | Core functionality | One Keccak-f round per cycle | Xilinx Spartan 3 | 2024 slices | 3460 Mbit/s | 81.4 MHz |
Keccak | Akin et al. [34] / N/A | Core functionality | One Keccak-f round per cycle | Xilinx Virtex-II | 2024 slices | 5810 Mbit/s | 136.6 MHz |
Keccak | Akin et al. [34] / N/A | Core functionality | One Keccak-f round per cycle | Xilinx Virtex 4 | 2024 slices | 6070 Mbit/s | 142.9 MHz |
Skein-256-h | Men Long [11] / N/A | Core functionality | UBI component | Xilinx Virtex 5 | 1001 slices | 408.7 Mbit/s | 114.9 MHz |
Skein-256-256 | Stefan Tillich [12] / On request | Fully autonomous | 8 Threefish rounds unrolled | Xilinx Virtex 5 | 937 slices | 1751 Mbit/s | 68.4 MHz |
Skein-256-256 | Stefan Tillich [12] / On request | Fully autonomous | 8 Threefish rounds unrolled | Xilinx Spartan 3 | 2421 slices | 669 Mbit/s | 26.14 MHz |
Skein-256-256 | Kobayashi et al. [3] / RCIS webpage | Fully autonomous | Xilinx Virtex 5 | 854 slices | 1482 Mbit/s | 115 MHz | |
Skein-512-256 | Homsirikamol et al. [30] / On request | Fully autonomous | 4 Threefish rounds unrolled | Xilinx Virtex 5 | 1621 slices | 3178 Mbit/s | 118.0 MHz |
Skein-512-256 | Homsirikamol et al. [30] / On request | Fully autonomous | 4 Threefish rounds unrolled | Altera Stratix III | 4645 ALUTs | 2503 Mbit/s | 92.9 MHz |
Skein-256-256 | Matsuo et al. [33] / RCIS website | Fully autonomous | Xilinx Virtex 5 | 854 slices | 1402 Mbit/s | 115 MHz | |
Skein-512-h | Men Long [11] / N/A | Core functionality | UBI component | Xilinx Virtex 5 | 1877 slices | 817.4 Mbit/s | 114.9 MHz |
Skein-512-512 | Stefan Tillich [12] / On request | Fully autonomous | 8 Threefish rounds unrolled | Xilinx Virtex 5 | 1632 slices | 3535 Mbit/s | 69.04 MHz |
Skein-512-512 | Stefan Tillich [12] / On request | Fully autonomous | 8 Threefish rounds unrolled | Xilinx Spartan 3 | 4273 slices | 1365 Mbit/s | 26.66 MHz |
Skein-512 | Baldwin et al. [31] / UCC webpage | Fully autonomous | Xilinx Virtex 5 | 1786 slices | 1945 Mbit/s | 83.65 MHz | |
Skein-512-512 | Homsirikamol et al. [30] / On request | Fully autonomous | 4 Threefish rounds unrolled | Xilinx Virtex 5 | 1716 slices | 3209 Mbit/s | 119.1 MHz |
Skein-512-512 | Homsirikamol et al. [30] / On request | Fully autonomous | 4 Threefish rounds unrolled | Altera Stratix III | 4794 ALUTs | 2434 Mbit/s | 90.3 MHz |
5.2 Low-Area Implementations (FPGA)
Hash Function Name | Reference / HDL | Impl. Scope | Implementation Details | Technology | Size | Throughput | Clock Frequency |
---|---|---|---|---|---|---|---|
BLAKE-256 | Beuchat et al. [13] / N/A | Fully autonomous | Rescheduled G function | Xilinx Spartan-3 | 124 slices | 82 Mbit/s | 190.0 MHz |
BLAKE-256 | Beuchat et al. [13] / N/A | Fully autonomous | Rescheduled G function | Xilinx Virtex-4 | 124 slices | 154 Mbit/s | 357.0 MHz |
BLAKE-256 | Beuchat et al. [13] / N/A | Fully autonomous | Rescheduled G function | Xilinx Virtex-5 | 56 slices | 161 Mbit/s | 372.0 MHz |
BLAKE-256 | Beuchat et al. [13] / N/A | Fully autonomous | Rescheduled G function | Altera Cyclone III | 285 LEs | 83 Mbit/s | 192.0 MHz |
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 1 G function unit | Xilinx Virtex-II Pro | 958 slices | 265 Mbit/s | 59.0 MHz |
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 1 G function unit | Xilinx Virtex 4 | 960 slices | 307 Mbit/s | 68.0 MHz |
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 1 G function unit | Xilinx Virtex 5 | 390 slices | 411 Mbit/s | 91.0 MHz |
BLAKE-256 | Kerckhof et al. [42] / N/A | Fully autonomous | Rescheduled G function | Xilinx Virtex 6 | 117 slices | 105 Mbit/s | 274.0 MHz |
BLAKE-512 | Beuchat et al. [13] / N/A | Fully autonomous | Rescheduled G function | Xilinx Spartan-3 | 229 slices | 121 Mbit/s | 158.0 MHz |
BLAKE-512 | Beuchat et al. [13] / N/A | Fully autonomous | Rescheduled G function | Xilinx Virtex-4 | 230 slices | 192 Mbit/s | 250.0 MHz |
BLAKE-512 | Beuchat et al. [13] / N/A | Fully autonomous | Rescheduled G function | Xilinx Virtex-5 | 108 slices | 275 Mbit/s | 358.0 MHz |
BLAKE-512 | Beuchat et al. [13] / N/A | Fully autonomous | Rescheduled G function | Altera Cyclone III | 542 LEs | 108 Mbit/s | 140.0 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 1 G function unit | Xilinx Virtex-II Pro | 1802 slices | 285 Mbit/s | 36.0 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 1 G function unit | Xilinx Virtex 4 | 1856 slices | 333 Mbit/s | 42.0 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 1 G function unit | Xilinx Virtex 5 | 939 slices | 466 Mbit/s | 59.0 MHz |
BLAKE-512 | Kerckhof et al. [42] / N/A | Fully autonomous | Rescheduled G function | Xilinx Virtex 6 | 192 slices | 183 Mbit/s | 240.0 MHz |
BLAKE-512 | Kerckhof et al. [42] / N/A | Fully autonomous | Rescheduled G function | Xilinx Spartan 6 | 230 slices | 103 Mbit/s | 135.0 MHz |
Grøstl-224/256 | Jungk et al. [6] / N/A | Fully autonomous | 64-bit datapath, P & Q permutation in parallel | Xilinx Spartan 3 | 2486 slices | 404 Mbit/s | 63.2 MHz |
Grøstl-224/256 | Jungk et al. [6] / N/A | Fully autonomous | 64-bit datapath, P & Q permutation in parallel | Xilinx Virtex 2 Pro | 2754 slices | 512 Mbit/s | 81.5 MHz |
Grøstl-224/256 | Jungk and Reith [22] / N/A | Fully autonomous | Shared P & Q permutation, S-Box based on composite field arithmetic | Xilinx Spartan 3 | 1276 slices | 192 Mbit/s | 60 MHz |
Grøstl-256 | Kerckhof et al. [42] / N/A | Fully autonomous | Interleaved P & Q permutations | Xilinx Virtex 6 | 260 slices | 815 Mbit/s | 280.0 MHz |
Grøstl-384/512 | Jungk and Reith [22] / N/A | Fully autonomous | Shared P & Q permutation, S-Box based on composite field arithmetic | Xilinx Spartan 3 | 2110 slices | 144 Mbit/s | 63 MHz |
Grøstl-512 | Kerckhof et al. [42] / N/A | Fully autonomous | Interleaved P & Q permutations | Xilinx Virtex 6 | 260 slices | 640 Mbit/s | 280.0 MHz |
Grøstl-512 | Kerckhof et al. [42] / N/A | Fully autonomous | Interleaved P & Q permutations | Xilinx Spartan 6 | 343 slices | 548 Mbit/s | 240.0 MHz |
JH-256 | Kerckhof et al. [42] / N/A | Fully autonomous | 64-bit datapath & distributed RAMs | Xilinx Virtex 6 | 240 slices | 214 Mbit/s | 288.0 MHz |
JH-512 | Kerckhof et al. [42] / N/A | Fully autonomous | 64-bit datapath & distributed RAMs | Xilinx Virtex 6 | 240 slices | 214 Mbit/s | 288.0 MHz |
JH-512 | Kerckhof et al. [42] / N/A | Fully autonomous | 64-bit datapath & distributed RAMs | Xilinx Spartan 6 | 260 slices | 84 Mbit/s | 113.0 MHz |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Using external memory | Small core using system memory | Altera Stratix III | 855 ALUTs | 96.8 Mbit/s | 366 MHz |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Using external memory | Small core using system memory | Altera Cyclone III | 1559 LEs | 47.8 Mbit/s | 181 MHz |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Using external memory | Small core using system memory | Xilinx Virtex 5 | 444 slices | 70.1 Mbit/s | 265 MHz |
Keccak(-256) | Kerckhof et al. [42] / N/A | Fully autonomous | Rho transformation performed with a barrel rotator | Xilinx Virtex 6 | 144 slices | 128 Mbit/s | 250.0 MHz |
Keccak(-512) | Kerckhof et al. [42] / N/A | Fully autonomous | Rho transformation performed with a barrel rotator | Xilinx Virtex 6 | 144 slices | 68 Mbit/s | 250.0 MHz |
Keccak(-512) | Kerckhof et al. [42] / N/A | Fully autonomous | Rho transformation performed with a barrel rotator | Xilinx Spartan 6 | 193 slices | 45 Mbit/s | 166.0 MHz |
Skein-256-256 | Namin and Hasan [2] / N/A | Core functionality | One round of Threefish iterated | Altera Stratix III | 1385 ALUTs | 573.9 Mbit/s | 161.42 MHz |
Skein-512-256 | Kerckhof et al. [42] / N/A | Fully autonomous | One round of Threefish iterated | Xilinx Virtex 6 | 240 slices | 179 Mbit/s | 160.0 MHz |
Skein-512-512 | Kerckhof et al. [42] / N/A | Fully autonomous | One round of Threefish iterated | Xilinx Virtex 6 | 240 slices | 179 Mbit/s | 160.0 MHz |
Skein-512-512 | Kerckhof et al. [42] / N/A | Fully autonomous | One round of Threefish iterated | Xilinx Spartan 6 | 292 slices | 102 Mbit/s | 91.0 MHz |
5.3 High-Speed Implementations (ASIC)
Hash Function Name | Reference / HDL | Impl. Scope | Implementation Details | Technology | Size | Throughput | Clock Frequency |
---|---|---|---|---|---|---|---|
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 8 G function units | UMC 0.18 µm | 58.30 kGates | 3782 Mbit/s | 114 MHz |
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 4 G function units | UMC 0.18 µm | 41.31 kGates | 2966 Mbit/s | 170 MHz |
BLAKE-256 | Namin and Hasan [2] / N/A | Core functionality | Compression function with 8 G function units and I/O registers | STM 90 nm | 53 kGates | 3196 Mbit/s(*) | 96.15 MHz |
BLAKE-256 | Tillich et al. [14] / On request | Fully autonomous | Compression function with 4 G function units with CSAs | UMC 0.18 µm | 45.64 kGates | 2836 Mbit/s | 170.64 MHz |
BLAKE-256 | Henzen et al. [29] / ETH webpage | Fully autonomous | Four parallel G functions modules | UMC 90 nm | 47.5 kGates | 6966 Mbit/s | 400 MHz |
BLAKE-256 | Guo et al. [35] / VT webpage | Fully autonomous | UMC 0.13 µm | 43.52 kGates | 3318 Mbit/s | 200 MHz | |
BLAKE-256 | RCIS webpage [37] / RCIS webpage | Fully autonomous | STM 90 nm | 37 kGates | 4763 Mbit/s | 286.5 MHz | |
BLAKE-256 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 8 G function units | UMC 0.18 µm | 79 kGates | 4548 Mbit/s | 137 MHz |
BLAKE-256 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 4 G function units | UMC 0.18 µm | 48 kGates | 4176 Mbit/s | 240 MHz |
BLAKE-256 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 8 G function units | UMC 0.13 µm | 67 kGates | 6689 Mbit/s | 201 MHz |
BLAKE-256 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 4 G function units | UMC 0.13 µm | 43 kGates | 5748 Mbit/s | 330 MHz |
BLAKE-256 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 8 G function units | UMC 90 nm | 65 kGates | 12499 Mbit/s | 376 MHz |
BLAKE-256 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 4 G function units | UMC 90 nm | 38 kGates | 10816 Mbit/s | 621 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 8 G function units | UMC 0.18 µm | 132.47 kGates | 5171 Mbit/s | 87 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with 4 G function units | UMC 0.18 µm | 82.73 kGates | 4209 Mbit/s | 136 MHz |
BLAKE-512 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 8 G function units | UMC 0.18 µm | 147 kGates | 6314 Mbit/s | 106 MHz |
BLAKE-512 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 4 G function units | UMC 0.18 µm | 98 kGates | 6293 Mbit/s | 204 MHz |
BLAKE-512 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 8 G function units | UMC 0.13 µm | 139 kGates | 9452 Mbit/s | 158 MHz |
BLAKE-512 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 4 G function units | UMC 0.13 µm | 92 kGates | 8982 Mbit/s | 291 MHz |
BLAKE-512 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 8 G function units | UMC 90 nm | 128 kGates | 17777 Mbit/s | 298 MHz |
BLAKE-512 | Henzen et al. [40] / Submission webpage | Fully autonomous | Compression function with 4 G function units | UMC 90 nm | 79 kGates | 16434 Mbit/s | 532 MHz |
Grøstl-256 | Tillich et al. [14] / On request | Fully autonomous | One shared permutation for P & Q, one pipeline stage | UMC 0.18 µm | 58.40 kGates | 6290 Mbit/s | 270.27 MHz |
Grøstl-256 | Henzen et al. [29] / ETH webpage | Fully autonomous | P and Q permutation interleaved with one pipeline stage, S-box as LUT | UMC 90 nm | 135 kGates | 16254 Mbit/s | 667 MHz |
Grøstl-256 | Guo et al. [35] / VT webpage | Fully autonomous | UMC 0.13 µm | 110.11 kGates | 9606 Mbit/s | 188 MHz | |
Grøstl-256 | RCIS webpage [37] / RCIS webpage | Fully autonomous | STM 90 nm | 139.1 kGates | 17297 Mbit/s | 337.8 MHz | |
Grøstl-256 | RCIS webpage [39] / RCIS webpage | Fully autonomous | STM 90 nm | 120.8 kGates | 16275 Mbit/s | 349.7 MHz | |
Grøstl-384/512 | Submission doc. [7] / N/A | Fully autonomous | P & Q permutation in parallel | UMC 0.18 µm | 341 kGates | 6225 Mbit/s | 85.1 MHz |
JH-256 | Tillich et al. [14] / On request | Fully autonomous | 320 S-boxes, one round of R8 per cycle | UMC 0.18 µm | 58.83 kGates | 4219 Mbit/s | 380.22 MHz |
JH-256 | Henzen et al. [29] / ETH webpage | Fully autonomous | S-boxes as LUTs, stored constants | UMC 90 nm | 80 kGates | 9134 Mbit/s | 760 MHz |
JH-256 | Guo et al. [35] / VT webpage | Fully autonomous | UMC 0.13 µm | 62.42 kGates | 4334 Mbit/s | 391 MHz | |
JH-256 | RCIS webpage [37] / RCIS webpage | Fully autonomous | STM 90 nm | 54.6 kGates | 8471 Mbit/s | 763.4 MHz | |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Fully autonomous | Core (round function, state register) & IO buffer | ST 0.13 µm | 48 kGates | 29900 Mbit/s | 526 MHz |
Keccak | Submission doc. [8] / Submission webpage | Fully autonomous | Core (round function, state register) only | ST 0.13 µm | 40 kGates | 15000 Mbit/s | 500 MHz |
Keccak(-256) | Tillich et al. [14] / On request | Fully autonomous | One instance of Keccak-f round | UMC 0.18 µm | 56.32 kGates | 21229 Mbit/s | 487.80 MHz |
Keccak(-256) | Henzen et al. [29] / ETH webpage | Fully autonomous | One round per cycle | UMC 90 nm | 50 kGates | 43011 Mbit/s | 949 MHz |
Keccak(-256) | Guo et al. [35] / VT webpage | Fully autonomous | UMC 0.13 µm | 47.43 kGates | 15457 Mbit/s | 377 MHz | |
Keccak | Akin et al. [34] / N/A | Core functionality | One Keccak-f round per cycle | Synopsys 90 nm | 10.5 kGates | 19320 Mbit/s | 454.5 MHz |
Keccak(-256) | RCIS webpage [37] / RCIS webpage | Fully autonomous | STM 90 nm | 50.7 kGates | 33333 Mbit/s | 781.3 MHz | |
Keccak(-256) | RCIS webpage [39] / RCIS webpage | Fully autonomous | STM 90 nm | 55.9 kGates | 43986 Mbit/s | 1030.9 MHz | |
Skein-256-256 | Stefan Tillich [12] / On request | Fully autonomous | 8 Threefish rounds unrolled | UMC 0.18 µm | 53.87 kGates | 1762 Mbit/s | 68.8 MHz |
Skein-256-256 | Namin and Hasan [2] / N/A | Core functionality | All 72 Threefish rounds unrolled | STM 90 nm | 369 kGates | 3126 Mbit/s(*) | 12.21 MHz |
Skein-256-256 | Tillich et al. [14] / On request | Fully autonomous | 8 Threefish rounds unrolled | UMC 0.18 µm | 58.61 kGates | 1882 Mbit/s | 73.52 MHz |
Skein-256-256 | Henzen et al. [29] / ETH webpage | Fully autonomous | Four unrolled Threefish rounds | UMC 90 nm | 50 kGates | 3558 Mbit/s | 264 MHz |
Skein-256-256 | Guo et al. [35] / VT webpage | Fully autonomous | UMC 0.13 µm | 40.9 kGates | 1941 Mbit/s | 159 MHz | |
Skein-256-256 | RCIS webpage [37] / RCIS webpage | Fully autonomous | STM 90 nm | 43.1 kGates | 3295 Mbit/s | 270.3 MHz | |
Skein-512-512 | Tillich et al. [14] / On request | Fully autonomous | 8 Threefish rounds unrolled | UMC 0.18 µm | 102.04 kGates | 2502 Mbit/s | 48.87 MHz |
Skein-512 | Walker et al. [36] / N/A] | Fully autonomous | 8 Threefish rounds unrolled | Intel 32 nm | 57.93 kGates | 32320 Mbit/s | 631.31 MHz |
(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.
5.4 Low-Area Implementations (ASIC)
Hash Function Name | Reference / HDL | Impl. Scope | Implementation Details | Technology | Size | Throughput | Clock Frequency |
---|---|---|---|---|---|---|---|
BLAKE-256 | Tillich et al. [18] / N/A | Fully autonomous | One G function in 11 cycles | AMS 0.35 µm | 25.57 kGates | 11 Mbit/s | 31.25 MHz |
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with a single G function unit | UMC 0.18 µm | 10.54 kGates | 180.7 Mbit/s | 40 MHz |
BLAKE-256 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with a half G function unit | UMC 0.18 µm | 9.89 kGates | 90.7 Mbit/s | 40 MHz |
BLAKE-256 | Henzen et al. [40] / Submission webpage | Fully autonomous | 1 adder and 4-word latch array | UMC 0.18 µm | 13.56 kGates | 96.4 Mbit/s | 215 MHz |
BLAKE-256 | Henzen et al. [40] / Submission webpage | Using external memory | 1 adder and 4-word latch array | UMC 0.18 µm | 8.60 kGates | 44.3 Mbit/s | 100 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with a single G function unit | UMC 0.18 µm | 20.61 kGates | 158.4 Mbit/s | 20 MHz |
BLAKE-512 | Submission doc. [1] / Submission webpage | Core functionality | Compression function with a half G function unit | UMC 0.18 µm | 19.46 kGates | 79.6 Mbit/s | 20 MHz |
Grøstl-224/256 | Tillich et al. [18] / N/A | Fully autonomous | 64-bit datapath, P & Q permutation shared | AMS 0.35 µm | 14.62 kGates | 145.9 Mbit/s | 55.87 MHz |
Grøstl-224/256 | Grøstl website [19] / N/A | Fully autonomous | 64-bit datapath, P & Q permutation shared | UMC 0.18 µm | 17 kGates | 645 Mbit/s | 246.9 MHz |
Grøstl-256 | RCIS webpage [39] / RCIS webpage | Fully autonomous | STM 90 nm | 34.8 kGates | 2478 Mbit/s | 101.6 MHz | |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Using external memory | Small core using system memory | ST 0.13 µm | 6.5 kGates | 176.4 Mbit/s(*) | 666.7 MHz |
Keccak | Updated spec. (v1.2) [8] / Submission webpage | Using external memory | Small core using system memory, clock freq. limited to 200 MHz | ST 0.13 µm | 5 kGates | 52.9 Mbit/s(**) | 200 MHz |
Skein-256-256 | Tillich et al. [18] / N/A | Fully autonomous | 64-bit datapath | AMS 0.35 µm | 12.89 kGates | 19.8 Mbit/s | 80 MHz |
Skein-256-256 | Namin and Hasan [2] / N/A | Core functionality | One round of Threefish iterated | STM 90 nm | 21 kGates | 1018.8 Mbit/s(***) | 286.53 MHz |
(*) Estimation for 64-bit memory interface: (1024 bits/permutation) * (666.7 * 10^6 cycles/s) / (3870 cycles/permutation) = 176.41 * 10^6 bits/s
(**) Estimation for 64-bit memory interface: (1024 bits/permutation) * (200 * 10^6 cycles/s) / (3870 cycles/permutation) = 52.92 * 10^6 bits/s
(***) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s
6 Comparative Studies
This section summarizes the reported results of publications which examined more than one round-three candidate in a similar setup.
6.1 BLAKE, Skein
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Namin and Hasan [2] | N/A | High-speed FPGA | Core functionality | Altera Stratix III |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | Compression function with 8 G function units and I/O registers | 5435 ALUTs | 1562 Mbit/s | 46.97 MHz |
Skein-256-256 | All 72 Threefish rounds unrolled (device too small) | N/A | N/A | N/A |
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Namin and Hasan [2] | N/A | High-speed ASIC | Core functionality | STM 90 nm |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | Compression function with 8 G function units and I/O registers | 53 kGates | 3196 Mbit/s(*) | 96.15 MHz |
Skein-256-256 | All 72 Threefish rounds unrolled | 369 kGates | 3126 Mbit/s(*) | 12.21 MHz |
(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s.
6.2 BLAKE, Grøstl, Skein
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Kobayashi et al. [3] | RCIS webpage | High-speed FPGA | Fully autonomous | Xilinx Virtex 5 |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | 1660 slices | 1911 Mbit/s | 115 MHz | |
Grøstl-256 | 4057 slices | 5171 Mbit/s | 101 MHz | |
Skein-256 | 854 slices | 1482 Mbit/s | 115 MHz |
6.3 All 5 Round-Three Candidates
Reported results are post-synthesis. An interactive graphical comparison of various area-performance tradeoffs of this study can be found here.
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Tillich et al. [14] | On request | High-speed ASIC | Fully autonomous | UMC 0.18 µm |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | Compression function with 4 G function units with CSAs | 45.64 kGates | 2836 Mbit/s | 170.64 MHz |
Grøstl-256 | One shared permutation for P & Q, one pipeline stage | 58.40 kGates | 6290 Mbit/s | 270.27 MHz |
JH-256 | 320 S-boxes, one round of R8 per cycle | 58.83 kGates | 4219 Mbit/s | 380.22 MHz |
Keccak(-256) | One instance of Keccak-f round | 56.32 kGates | 21229 Mbit/s | 487.80 MHz |
Skein-256-256 | 8 Threefish rounds unrolled | 58.61 kGates | 1882 Mbit/s | 73.52 MHz |
Skein-512-512 | 8 Threefish rounds unrolled | 102.04 kGates | 2502 Mbit/s | 48.87 MHz |
6.4 BLAKE, Grøstl, Skein
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Tillich et al. [18] | N/A | Low-area ASIC | Fully autonomous | AMS 0.35 µm |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | One G function in 11 cycles | 25.57 kGates | 11 Mbit/s | 31.25 MHz |
Grøstl-224/256 | 64-bit datapath, P & Q permutation shared | 14.62 kGates | 145.9 Mbit/s | 55.87 MHz |
Skein-256-256 | 64-bit datapath | 12.89 kGates | 19.8 Mbit/s | 80 MHz |
6.5 All 5 Round-Three Candidates
Reported results of this study are post-P&R performances of designs targeting high throughput.
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Henzen et al. [29] | ETH webpage | High-speed ASIC | Fully autonomous | UMC 90 nm |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | Four parallel G functions modules | 47.5 kGates | 6966 Mbit/s | 400 MHz |
Grøstl-256 | P and Q permutation interleaved with one pipeline stage, S-box as LUT | 135 kGates | 16254 Mbit/s | 667 MHz |
JH-256 | S-boxes as LUTs, stored constants | 80 kGates | 9134 Mbit/s | 760 MHz |
Keccak(-256) | One round per cycle | 50 kGates | 43011 Mbit/s | 949 MHz |
Skein-256-256 | Four unrolled Threefish rounds | 50 kGates | 3558 Mbit/s | 264 MHz |
6.6 All 5 Round-Three Candidates
Designs optimized towards throughput to area ratio. The cited results are those for the Xilinx Virtex 5 and Altera Stratix III platforms (both for the 256-bit and the 512-bit version of the candidates). Results marked with N/A did not fit into the largest device of the device family. For a full listing of all ATHENa results refer to the ATHENa webpage.
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Homsirikamol et al. [30] | On request | High-speed FPGA | Fully autonomous | Xilinx Virtex 5 |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | 4 G function units per iteration | 1523 slices | 2245 Mbit/s | 128.9 MHz |
BLAKE-512 | 4 G function units per iteration | 3064 slices | 3080 Mbit/s | 99.7 MHz |
Grøstl-256 | P & Q permutations interleaved | 1597 slices | 7885 Mbit/s | 323.4 MHz |
Grøstl-512 | P & Q permutations interleaved | 3138 slices | 10314 Mbit/s | 292.1 MHz |
JH-256 | 1018 slices | 4578 Mbit/s | 380.8 MHz | |
JH-512 | 1104 slices | 4742 Mbit/s | 394.5 MHz | |
Keccak(-256) | 1272 slices | 12817 Mbit/s | 282.7 MHz | |
Keccak(-512) | 1257 slices | 6845 Mbit/s | 285.2 MHz | |
Skein-512-256 | 4 Threefish rounds unrolled | 1621 slices | 3178 Mbit/s | 118.0 MHz |
Skein-512-512 | 4 Threefish rounds unrolled | 1716 slices | 3209 Mbit/s | 119.1 MHz |
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Homsirikamol et al. [30] | N/A | High-speed FPGA | Fully autonomous | Altera Stratix III |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | 4 G function units per iteration | 3635 ALUTs | 2072 Mbit/s | 119.0 MHz |
BLAKE-512 | 4 G function units per iteration | 7086 ALUTs | 2766 Mbit/s | 89.5 MHz |
Grøstl-256 | P & Q permutations interleaved | 6350 ALUTs | 5380 Mbit/s | 220.7 MHz |
Grøstl-512 | P & Q permutations interleaved | 12355 ALUTs | 7142 Mbit/s | 202.3 MHz |
JH-256 | 3525 ALUTs | 4661 Mbit/s | 387.8 MHz | |
JH-512 | 3709 ALUTs | 4696 Mbit/s | 390.6 MHz | |
Keccak(-256) | 4213 ALUTs | 12393 Mbit/s | 273.4 MHz | |
Keccak(-512) | 3979 ALUTs | 7310 Mbit/s | 304.6 MHz | |
Skein-512-256 | 4 Threefish rounds unrolled | 4645 ALUTs | 2503 Mbit/s | 92.9 MHz |
Skein-512-512 | 4 Threefish rounds unrolled | 4794 ALUTs | 2434 Mbit/s | 90.3 MHz |
6.7 All 5 Round-Three Candidates
Results are without wrapper for long messages.
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Baldwin et al. [31] | UCC webpage | High-speed FPGA | Fully autonomous | Xilinx Virtex 5 |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | 1118 slices | 835 Mbit/s | 118.06 MHz | |
BLAKE-512 | 1718 slices | 1137 Mbit/s | 90.91 MHz | |
Grøstl-256 | 2391 slices | 3242 Mbit/s | 101.32 MHz | |
Grøstl-512 | 4845 slices | 3619 Mbit/s | 123.4 MHz | |
JH | 1291 slices | 1641 Mbit/s | 250.13 MHz | |
Keccak(-224) | 1117 slices | 5915 Mbit/s | 189 MHz | |
Keccak(-256) | 1117 slices | 6263 Mbit/s | 189 MHz | |
Keccak(-384) | 1117 slices | 8190 Mbit/s | 189 MHz | |
Keccak(-512) | 1117 slices | 8518 Mbit/s | 189 MHz | |
Skein-512 | 1786 slices | 1945 Mbit/s | 83.65 MHz |
6.8 All 5 Round-Three Candidates
Results include throughputs without interface overhead.
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Matsuo et al. [33] | RCIS webpage | High-speed FPGA | Fully autonomous | Xilinx Virtex 5 |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | 1660 slices | 1911 Mbit/s | 115 MHz | |
Grøstl-256 | 2616 slices | 7885 Mbit/s | 154 MHz | |
JH-256 | 2661 slices | 2231 Mbit/s | 201 MHz | |
Keccak(-256) | 1433 slices | 8397 Mbit/s | 205 MHz | |
Skein-256-256 | 854 slices | 1402 Mbit/s | 115 MHz |
Same implementations as in Matsuo et al. [33] implemented on STM 90 nm technology.
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
RCIS webpage [37] | RCIS webpage | High-speed ASIC | Fully autonomous | STM 90 nm |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | 37 kGates | 4763 Mbit/s | 286.5 MHz | |
Grøstl-256 | 139.1 kGates | 17297 Mbit/s | 337.8 MHz | |
JH-256 | 54.6 kGates | 8471 Mbit/s | 763.4 MHz | |
Keccak(-256) | 50.7 kGates | 33333 Mbit/s | 781.3 MHz | |
Skein-256-256 | 43.1 kGates | 3295 Mbit/s | 270.3 MHz |
6.9 All 5 Round-Three Candidates
Results are post-P&R and include throughputs without interface overhead.
Reference | HDL | Category | Impl. Scope | Technology |
---|---|---|---|---|
Guo et al. [35] | VT webpage | High-speed ASIC | Fully autonomous | UMC 0.13 µm |
Hash Function Name | Impl. Details | Size | Throughput | Clock Frequency |
---|---|---|---|---|
BLAKE-256 | 43.52 kGates | 3318 Mbit/s | 200 MHz | |
Grøstl-256 | 110.11 kGates | 9606 Mbit/s | 188 MHz | |
JH-256 | 62.42 kGates | 4334 Mbit/s | 391 MHz | |
Keccak(-256) | 47.43 kGates | 15457 Mbit/s | 377 MHz | |
Skein-256-256 | 40.9 kGates | 1941 Mbit/s | 159 MHz |
7 References
[1] Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and Raphael C.-W. Phan. SHA-3 proposal BLAKE (version 1.3). Available online at http://131002.net/blake/blake.pdf.
[2] A. H. Namin and M. A. Hasan. Hardware Implementation of the Compression Function for Selected SHA-3 Candidates. Available online at http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html.
[3] Kazuyuki Kobayashi, Jun Ikegami, Shin'ichiro Matsuo, Kazuo Sakiyama, and Kazuo Ohta. Evaluation of Hardware Performance for the SHA-3 Candidates Using SASEBO-GII. IACR Eprint report 2010/010. Available online at http://eprint.iacr.org/2010/010.pdf.
[4] Brian Baldwin, Andrew Byrne, Mark Hamilton, Neil Hanley, Robert P. McEvoy, Weibo Pan, and William P. Marnane. FPGA Implementations of SHA-3 Candidates: CubeHash, Grøstl, LANE, Shabal and Spectral Hash. IACR Eprint report 2009/342. Available online at http://eprint.iacr.org/2009/342.pdf.
[5] Liang Lu, Maire O'Neil, and Earl Swartzlander. Hardware Evaluation of SHA-3 Hash Function Candidate ECHO. Presentation at the Clauce Shannon Institute Workshop on Coding and Cryptography 2009. Slides available online at http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf.
[6] Bernhard Jungk, Steffen Reith, and Jürgen Apfelbeck. On Optimized FPGA Implementations of the SHA-3 Candidate Grøstl. IACR Eprint report 2009/206. Available online at http://eprint.iacr.org/2009/206.pdf.
[7] Praveen Gauravaram, Lars R. Knudsen, Krystian Matusievicz, Florian Mendel, Christian Rechberger, Martin Schläffer, and Søren S. Thomsen. Grøstl - a SHA-3 candidate (October 31, 2008). Available online at http://www.groestl.info/Groestl.pdf.
[8] Guido Bertoni, Joan Daemen, Michaël Peeters, and Gilles van Assche. KECCAK sponge function family main document (Version 1.2, April 23, 2009). Available online at http://keccak.noekeon.org/Keccak-main-1.2.pdf.
[9] Joachim Strömbergson. Implementation of the Keccak Hash Function in FPGA Devices. Available online at http://www.strombergson.com/files/Keccak_in_FPGAs.pdf.
[10] Romain Feron and Julien Francq. FPGA Implementation of Shabal: Our First Results (Version 2.0, February 19, 2010). Available online at http://www.shabal.com/wp-content/uploads/2010/03/FPGA-Implementation-of-Shabal-First-ResultsV2.0.pdf.
[11] Men Long. Implementing Skein Hash Function on Xilinx Virtex-5 FPGA Platform (Version 0.7, February 2, 2009). Available online at http://www.skein-hash.info/sites/default/files/skein_fpga.pdf.
[12] Stefan Tillich. Hardware Implementation of the SHA-3 Candidate Skein. IACR Eprint report 2009/159. Available online at http://eprint.iacr.org/2009/159.pdf.
[13] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. Compact Implementations of BLAKE-32 and BLAKE-64 on FPGA. IACR Eprint report 2010/173. Available online at http://eprint.iacr.org/2010/173.pdf.
[14] Stefan Tillich, Martin Feldhofer, Mario Kirschbaum, Thomas Plos, Jörn-Marc Schmidt, and Alexander Szekely. High-Speed Hardware Implementations of BLAKE, Blue Midnight Wish, CubeHash, ECHO, Fugue, Grøstl, Hamsi, JH, Keccak, Luffa, Shabal, SHAvite-3, SIMD, and Skein. IACR Eprint report 2009/510. Available online at http://eprint.iacr.org/2009/510.pdf.
[15] Shai Halevi, William E. Hall, and Charanjit S. Jutla. The Hash Function Fugue (October 30, 2008). Available online at http://domino.research.ibm.com/comm/research_projects.nsf/pages/fugue.index.html/$FILE/NIST-submission-Oct08-fugue.pdf.
[16] Junfeng Fan. Hardware Evaluation of The Hash Function Hamsi. Available online at http://homes.esat.kuleuven.be/~okucuk/hamsi/implementations.html.
[17] Miroslav Knezevic and Ingrid Verbeiwhede. Hardware Evaluation of the Luffa Hash Family. 4th Workshop on Embedded Systems Security 2009. Available online at http://www.cosic.esat.kuleuven.be/publications/article-1282.pdf.
[18] Stefan Tillich, Martin Feldhofer, Wolfgang Issovits, Thomas Kern, Hermann Kureck, Michael Mühlberghuber, Georg Neubauer, Andreas Reiter, Armin Köfler, and Mathias Mayrhofer. Compact Hardware Implementations of the SHA-3 Candidates ARIRANG, BLAKE, Grøstl, and Skein. IACR Eprint report 2009/349. Available online at http://eprint.iacr.org/2009/349.pdf.
[19] Grøstl website. http://www.groestl.info/.
[20] Markus Bernet, Luca Henzen, Hubert Kaeslin, Norbert Felber, and Wolfgang Fichtner. Hardware Implementations of the SHA-3 Candidates Shabal and CubeHash. 52nd IEEE International Midwest Symposium on Circuits and Systems, 2009. Available online at http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5236043.
[21] Michel Kinsy and Richard Uhler. SHA-3: FPGA Implementation of ESSENCE and ECHO Hash Algorithm Candidates Using Bluespec. Available online at http://csg.csail.mit.edu/6.375/6_375_2009_www/projects/group1_report.pdf.
[22] Bernhard Jungk and Steffen Reith. On FPGA-based implementations of Grøstl. IACR Eprint report 2010/260. Available online at http://eprint.iacr.org/2010/260.pdf.
[23] Jérémie Detrey, Pierre Gaudry, and Karim Khalfallah. A Low-Area yet Performant FPGA Implementation of Shabal. IACR Eprint report 2010/292. Available online at http://eprint.iacr.org/2010/292.pdf.
[24] Jean-Luc Beuchat, Eiji Okamoto, and Teppei Yamazaki. A Compact FPGA Implementation of the SHA-3 Candidate ECHO. IACR Eprint report 2010/364. Available online at http://eprint.iacr.org/2010/364.pdf.
[25] Wim Ramakers and Hans Narinx. Implementation and evaluation of SHA-3 candidates on FPGA. Extended abstract of Master Thesis "Implementatie en Evaluatie van SHA-3-Kandidaten op FPGA" (Dutch). Extended abstract available online at http://ehash.iaik.tugraz.at/uploads/1/12/Ramakers_Narinx2010ECHO-Hamsi-Luffa_ExtendedAbstract_ENGLISH.pdf. Full thesis available online at http://ehash.iaik.tugraz.at/uploads/6/62/Ramakers_Narinx2010ECHO-Hamsi-Luffa_Thesis_DUTCH.pdf.
[26] Julien Francq and Céline Thuillet. Unfolding Method for Shabal on Virtex-5 FPGAs: Concrete Results. IACR Eprint report 2010/406. Available online at http://eprint.iacr.org/2010/406.pdf.
[27] Shugo Mikami, Nagamasa Mizushima, Setsuko Nakamura, and Dai Watanabe. A Compact Hardware Implementation of SHA-3 Candidate Luffa (version 20101105). Available online at http://www.sdl.hitachi.co.jp/crypto/luffa/ACompactHardwareImplementationOfSHA-3CandidateLuffa_20101105.pdf.
[28] Imed Mabrouk and Ryad Benadjila. ECHO webpage (hardware subpage). http://crypto.rd.francetelecom.com/ECHO/hard/.
[29] Luca Henzen, Pietro Gendotti, Patrice Guillet, Enrico Pargaetzi, Martin Zoller, and Frank K. Gürkaynak. Developing a Hardware Evaluation Method for SHA-3 Candidates. 12th International Workshop on Cryptographic Hardware and Embedded Systems (CHES), 2010. Available online at http://www.springerlink.com/content/g0115v3272156r06/.
[30] Ekawat Homsirikamol, Marcin Rogawski, and Kris Gaj. Comparing Hardware Performance of Fourteen Round Two SHA-3 Candidates Using FPGAs. IACR Eprint report 2010/445. Available online at http://eprint.iacr.org/2010/445.pdf.
[31] Brian Baldwin, Neil Hanley, Mark Hamilton, Liang Lu, Andrew Byrne, Maire O'Neill, and William P. Marnane. FPGA Implementations of the Round Two SHA-3 Candidates. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/BALDWIN_FPGA_SHA3.pdf.
[32] Mohamed El Hadedy, Martin Margala, Danilo Gligoroski, and Svein J. Knapskog. Resource-Efficient Implementation of Blue Midnight Wish-256 Hash Function on Xilinx FPGA Platform. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/El-Hadedy_SmallSizeFPGA-BMW256.pdf.
[33] Shin'ichiro Matsuo, Miroslav Knezevic, Patrick Schaumont, Ingrid Verbauwhede, Akashi Satoh, Kazuo Sakiyama, and Kazuo Ota. How Can We Conduct "Fair and Consistent" Hardware Evaluation for SHA-3 Candidate? Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/MATSUO_SHA-3_Criteria_Hardware_revised.pdf.
[34] Abdulkadir Akin, Aydin Aysu, Onur Can Ulusel, and Erkay Savas. Efficient Hardware Implementations of High Throughput SHA-3 Candidates Keccak, Luffa and Blue Midnight Wish for Single- and Multi-Message Hashing. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SAVAS_SHA3_NIST_final.pdf.
[35] Xu Guo, Sinan Huang, Leyla Nazhandali, and Patrick Schaumont. Fair and Comprehensive Performance Evaluation of 14 Second Round SHA-3 ASIC Implementations. Second SHA-3 Candidate Conference, 2010. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/SCHAUMONT_SHA3.pdf.
[36] Jesse Walker, Farhana Sheikh, Sanu K. Mathew, and Ram Krishnamurthy. A Skein-512 Hardware Implementation. Available online at http://csrc.nist.gov/groups/ST/hash/sha-3/Round2/Aug2010/documents/papers/WALKER_skein-intel-hwd.pdf.
[37] RCIS webpage. http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/implement.html.
[38] Akashi Satoh, Toshihiro Katashita, Takeshi Sugawara, Naofumi Homma, and Takafumi Aoki. Hardware Implementations of Hash Function Luffa. IEEE International Symposium on Hardware-Oriented Security and Trust (HOST), 2010. Available online at http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5513102&tag=1.
[39] RCIS webpage (Other ASIC Implementations). http://staff.aist.go.jp/akashi.satoh/SASEBO/en/sha3/others.html.
[40] Luca Henzen, Jean-Philippe Aumasson, Willi Meier, and Raphael C.-W. Phan. VLSI Characterization of the Cryptographic Hash Function BLAKE. IEEE T VLSI, 2010. Available online at http://131002.net/data/papers/HAMP10.pdf.
[41] Mohamed El Hadedy, Danilo Gligoroski, and Svein J. Knapskog. Single Core Implementation of Blue Midnight Wish Hash Function on VIRTEX 5 Platform. Available online at http://people.item.ntnu.no/~danilog/Hash/BMW-SecondRound/SmallSizeFPGA-BMWOct2010.pdf.
[42] Stéphanie Kerckhof, François Durvaux, Nicolas Veyrat-Charvillon, Francesco Regazzoni, Guerric Meurice de Dormale, François-Xavier Standaert. Compact FPGA Implementations of the Five SHA-3 Finalists. Available online at http://perso.uclouvain.be/fstandae/PUBLIS/99.pdf.