Difference between revisions of "SHA-3 Hardware Implementations"

Revision as of 14:44, 17 February 2010

1 Important Information

This page summarizes key properties of reported hardware implementations of those SHA-3 candidates, which are currently under consideration by NIST. This is work in progress. If you know of any implementations which should be mentioned on this page, refer to our call for contributions.

A list of hardware implementations of the round 1 candidates can be found here. Please note that the page for round 1 candidates is provided for reference and will not be updated.

The implementations are categorized into FPGA and standard-cell ASIC implementations. Note that the diversity of implementation scope, target technologies, and synthesis tools makes direct comparisions between different hardware implementation difficult. The more of these parameters agree, the more reasonable the comparison becomes.

The target technology should be as similar as possible. For FPGA implementation, it is desirable to compare implementations on the same target device (or at least on devices of the same FPGA family). For standard-cell ASIC implementation, at least the minimal gate length of the process (e.g., 0.13 µm) should agree. More ideally, the implementations use the same standard-cell library (which implies the use of the same process technology).

In order to facilitate the comparision of hardware modules with different implementation scopes, we classify them into three categories:

For suggestions regarding the structure of this site, let us know at sha3zoo-hardware@iaik.tugraz.at

1.1 Fully Autonomous Implementation

Such hardware implementations include the complete functionality of a SHA-3 candidate (or a specific version thereof). That means the input message can be loaded piecewise into the hardware module and it delivers the message digest as output. All hash calculations happen exclusively within the hardware module. If integrated in a system, the achievable throughput of a fully autonomous implementation depends on the speed of the hardware module itself and the speed of the (system dependent) data interface delivering the input message.

1.2 Implementation with External Memory

These implementations use external memory to hold intermediate values during the hashing of a message. The implemented hardware itself normally consists of the core logic functionality of the hash function, some registers for short-lived temporary values, and possible a memory controller for access to the external memory. Such implementations can load the input message either over a dedicated interface (similar to a fully autonomous implementation) or from the external memory. In order to reach the maximal throughput of the hardware module, the external memory must be sufficiently fast.

1.3 Implementation of Core Functionality

Such implementations comprise only important parts of the hash function (e.g., the compression function), which normally allows to get a first-order estimate of the performance figures of full implementations.

2 High-Speed Implementations (FPGA)

Important note: The size and functionality of slices varies between FPGA families. A direct comparision of the slice count of implementations on different FPGA families is therefore problematic.

Hash Function Name	Reference / HDL	Impl. Scope	Impl. Details	Technology	Size	Throughput	Clock Frequency
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with 8 G function units	Xilinx Virtex-II Pro	3091 slices	1724 Mbit/s	37.0 MHz
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with 8 G function units	Xilinx Virtex 4	3087 slices	2235 Mbit/s	48.0 MHz
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with 8 G function units	Xilinx Virtex 5	1694 slices	3103 Mbit/s	67.0 MHz
BLAKE-32	Namin and Hasan / N/A	Core functionality	Compression function with 8 G function units and I/O registers	Altera Stratix III	5435 ALUTs	2186.2 Mbit/s	46.97 MHz
BLAKE-32	Kobayashi et al. / N/A	Fully autonomous		Xilinx Virtex 5	1660 slices	2676 Mbit/s	115 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with 8 G function units	Xilinx Virtex-II Pro	11122 slices	1177 Mbit/s	17.0 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with 8 G function units	Xilinx Virtex 4	11483 slices	1707 Mbit/s	25.0 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with 8 G function units	Xilinx Virtex 5	4329 slices	2389 Mbit/s	35.0 MHz
Blue Midnight Wish-256	Namin and Hasan / N/A	Core functionality	Compression function with f0, f1, and f2 unrolled in sequence and I/O registers	Altera Stratix III	12917 ALUTs	4889.6 Mbit/s	9.55 MHz
CubeHash8/1-256	Baldwin et al. / N/A	Core functionality	2 compression functions unrolled	Xilinx Spartan 3	3268 slices	70 Mbit/s	37.9 MHz
CubeHash8/1-256	Baldwin et al. / N/A	Core functionality	1 iterated compression function	Xilinx Virtex 5	1178 slices	160 Mbit/s	166.8 MHz
CubeHash16/32-256	Kobayashi et al. / N/A	Fully autonomous		Xilinx Virtex 5	590 slices	2960 Mbit/s	185 MHz
ECHO-224/256	Lu et al. / N/A	Fully autonomous		Xilinx Virtex 5	9333 slices	14860 Mbit/s	87.1 MHz
ECHO-256	Kobayashi et al. / N/A	Fully autonomous		Xilinx Virtex 5	3556 slices	1614 Mbit/s	104 MHz
ECHO-384/512	Lu et al. / N/A	Fully autonomous		Xilinx Virtex 5	9097 slices	7810 Mbit/s	83.9 MHz
Grøstl-224/256	Jungk et al. / N/A	Fully autonomous	P & Q permutation in parallel	Xilinx Spartan 3	6136 slices	4520 Mbit/s	88.3 MHz
Grøstl-224/256	Submission document / N/A	Fully autonomous	P & Q permutation in parallel	Xilinx Virtex 5	1722 slices	10276 Mbit/s	200.7 MHz
Grøstl-256	Kobayashi et al. / N/A	Fully autonomous		Xilinx Virtex 5	4057 slices	5171 Mbit/s	101 MHz
Grøstl-384/512	Submission document / N/A	Fully autonomous	P & Q permutation in parallel	Xilinx Spartan 3	20233 slices	5901 Mbit/s	80.7 MHz
Grøstl-384/512	Baldwin et al. / N/A	Core functionality	P & Q permutation interleaved, S-box in BRAM	Xilinx Spartan 3	6313 slices	2910 Mbit/s	79.61 MHz
Grøstl-384/512	Submission document / N/A	Fully autonomous	P & Q permutation in parallel	Xilinx Virtex 5	5419 slices	15395 Mbit/s	210.5 MHz
Hamsi-256	Kobayashi et al. / N/A	Fully autonomous		Xilinx Virtex 5	718 slices	1680 Mbit/s	210 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Fully autonomous	Core (round function, state register) & IO buffer	Altera Cyclone III	5776 LEs	7500 Mbit/s	133 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Fully autonomous	Core (round function, state register) & IO buffer	Altera Stratix III	4713 ALUTs	12400 Mbit/s	218 MHz
Keccak	Joachim Strömbergson / Submission webpage	Fully autonomous	Core (round function, state register) only	Xilinx Spartan 3A	3393 slices	4800 Mbit/s	85 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Fully autonomous	Core (round function, state register) & IO buffer	Xilinx Virtex 5	1412 slices	6900 Mbit/s	122 MHz
Luffa-256	Namin and Hasan / N/A	Core functionality	Compression function (1 cycle latency) and I/O registers	Altera Stratix III	16552 ALUTs	12042.2 Mbit/s	47.04 MHz
Luffa-256	Kobayashi et al. / N/A	Fully autonomous		Xilinx Virtex 5	1048 slices	6343 Mbit/s	223 MHz
Shabal	Feron and Francq / N/A	Fully autonomous	36 adders in permutation	Xilinx Virtex 5	1171 slices	2588 Mbit/s	126 MHz
Shabal	Baldwin et al. / N/A	Core functionality	36 adders in permutation	Xilinx Spartan 3	2223 slices	740 Mbit/s	71.48 MHz
Shabal	Baldwin et al. / N/A	Core functionality	36 adders in permutation	Xilinx Virtex 5	2768 slices	1450 Mbit/s	138.87 MHz
Shabal-256	Namin and Hasan / N/A	Core functionality	Compression function with I/O registers (latency of 16 clock cycles)	Altera Stratix III	1440 ALUTs	3125.6 Mbit/s	195.35 MHz
Shabal-256	Kobayashi et al. / N/A	Fully autonomous		Xilinx Virtex 5	1251 slices	1739 Mbit/s	214 MHz
Skein-256	Men Long / N/A	Core functionality	UBI component	Xilinx Virtex 5	1001 slices	408.7 Mbit/s	114.9 MHz
Skein-256	Stefan Tillich / On request	Fully autonomous	8 Threefish rounds unrolled	Xilinx Virtex 5	937 slices	1751 Mbit/s	68.4 MHz
Skein-256	Stefan Tillich / On request	Fully autonomous	8 Threefish rounds unrolled	Xilinx Spartan 3	2421 slices	669 Mbit/s	26.14 MHz
Skein-256	Kobayashi et al. / N/A	Fully autonomous		Xilinx Virtex 5	854 slices	1482 Mbit/s	115 MHz
Skein-512	Men Long / N/A	Core functionality	UBI component	Xilinx Virtex 5	1877 slices	817.4 Mbit/s	114.9 MHz
Skein-512	Stefan Tillich / On request	Fully autonomous	8 Threefish rounds unrolled	Xilinx Virtex 5	1632 slices	3535 Mbit/s	69.04 MHz
Skein-512	Stefan Tillich / On request	Fully autonomous	8 Threefish rounds unrolled	Xilinx Spartan 3	4273 slices	1365 Mbit/s	26.66 MHz

3 Low-Area Implementations (FPGA)

Hash Function Name	Reference / HDL	Impl. Scope	Implementation Details	Technology	Size	Throughput	Clock Frequency
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with 1 G function unit	Xilinx Virtex-II Pro	958 slices	371 Mbit/s	59.0 MHz
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with 1 G function unit	Xilinx Virtex 4	960 slices	430 Mbit/s	68.0 MHz
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with 1 G function unit	Xilinx Virtex 5	390 slices	575 Mbit/s	91.0 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with 1 G function unit	Xilinx Virtex-II Pro	1802 slices	326 Mbit/s	36.0 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with 1 G function unit	Xilinx Virtex 4	1856 slices	381 Mbit/s	42.0 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with 1 G function unit	Xilinx Virtex 5	939 slices	533 Mbit/s	59.0 MHz
Grøstl-224/256	Jungk et al. / N/A	Fully autonomous	64-bit datapath, P & Q permutation in parallel	Xilinx Spartan 3	2486 slices	404 Mbit/s	63.2 MHz
Grøstl-224/256	Jungk et al. / N/A	Fully autonomous	64-bit datapath, P & Q permutation in parallel	Xilinx Virtex 2 Pro	2754 slices	512 Mbit/s	81.5 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Using external memory	Small core using system memory	Altera Stratix III	855 ALUTs	96.8 Mbit/s	366 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Using external memory	Small core using system memory	Altera Cyclone III	1559 LEs	47.8 Mbit/s	181 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Using external memory	Small core using system memory	Xilinx Virtex 5	444 slices	70.1 Mbit/s	265 MHz
Shabal	Feron and Francq / N/A	Fully autonomous	36 adders in permutation	Xilinx Virtex 5	596 slices	1142 Mbit/s	109 MHz
Shabal	Baldwin et al.	Core functionality	1 adder in permutation	Xilinx Spartan 3	1933 slices	540 Mbit/s	89.71 MHz
Shabal	Baldwin et al. / N/A	Core functionality	1 adder in permutation	Xilinx Virtex 5	2307 slices	1330 Mbit/s	222.22 MHz
Skein-256-256	Namin and Hasan / N/A	Core functionality	One round of Threefish iterated	Altera Stratix III	1385 ALUTs	573.9 Mbit/s	161.42 MHz

4 High-Speed Implementations (ASIC)

A comparison of implementations of all 14 round 2 candidates has been presented informally at IAIK (Graz University of Technology) on Sept. 16, 2009. The updated presentation slides can be found here.

The rows shaded in gray are results of a benchmarking of implementations of all 14 candidates on the same technology. Details on this benchmarking can be found here. An interactive graphical comparison of various area-performance tradeoffs can be found here.

Hash Function Name	Reference / HDL	Impl. Scope	Implementation Details	Technology	Size	Throughput	Clock Frequency
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with 8 G function units	UMC 0.18 µm	58.30 kGates	5295 Mbit/s	114 MHz
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with 4 G function units	UMC 0.18 µm	41.31 kGates	4153 Mbit/s	170 MHz
BLAKE-32	Namin and Hasan / N/A	Core functionality	Compression function with 8 G function units and I/O registers	STM 90 nm	53 kGates	4475 Mbit/s(*)	96.15 MHz
BLAKE-32	Tillich et al. / On request	Fully autonomous	Compression function with 4 G function units with CSAs	UMC 0.18 µm	45.64 kGates	3971 Mbit/s	170.64 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with 8 G function units	UMC 0.18 µm	132.47 kGates	5910 Mbit/s	87 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with 4 G function units	UMC 0.18 µm	82.73 kGates	4810 Mbit/s	136 MHz
Blue Midnight Wish-256	Namin and Hasan / N/A	Core functionality	Compression function with f0, f1, and f2 unrolled in sequence and I/O registers	STM 90 nm	164 kGates	26665 Mbit/s(*)	52.08 MHz
Blue Midnight Wish-256	Tillich et al. / On request	Fully autonomous	40 32-bit adders shared by f0, f1, and f2, two temporary 512-bit states	UMC 0.18 µm	122.09 kGates	1586 Mbit/s	164.20 MHz
CubeHash16/32-h	Tillich et al. / On request	Fully autonomous	Dynamically reconfigurable r and b parameters, two rounds unrolled	UMC 0.18 µm	58.87 kGates	4665 Mbit/s	145.77 MHz
ECHO-224/256	Lu et al. / N/A	Fully autonomous		0.13 µm	521.1 kGates	14850 Mbit/s	87.1 MHz
ECHO-256	Tillich et al. / On request	Fully autonomous	Four parallel AES rounds, 16 AES MixColumns 32-bit column multipliers	UMC 0.18 µm	141.49 kGates	2246 Mbit/s	141.84 MHz
ECHO-384/512	Lu et al. / N/A	Fully autonomous		0.13 µm	516.8 kGates	7750 Mbit/s	83.3 MHz
Fugue-256	Submission document / N/A	Fully autonomous	Four columns of SMIX transformation in parallel (SUPER4_P)	IBM 90 nm	109.85 kGates	13913 Mbit/s	869.5 MHz
Fugue-256	Tillich et al. / On request	Fully autonomous	Four columns of SMIX transformation in parallel	UMC 0.18 µm	46.26 kGates	4092 Mbit/s	255.75 MHz
Grøstl-256	Tillich et al. / On request	Fully autonomous	One shared permutation for P & Q, one pipeline stage	UMC 0.18 µm	58.40 kGates	6290 Mbit/s	270.27 MHz
Grøstl-384/512	Submission document / N/A	Fully autonomous	P & Q permutation in parallel	UMC 0.18 µm	341 kGates	6225 Mbit/s	85.1 MHz
Hamsi-256	Junfeng Fan (Hamsi website) / N/A	Fully autonomous		0.13 µm	22 kGates	4940 Mbit/s	1080 MHz
Hamsi-256	Tillich et al. / On request	Fully autonomous	Three instances of P/Pf function unrolled	UMC 0.18 µm	58.66 kGates	5565 Mbit/s	173.91 MHz
Hamsi-512	Junfeng Fan (Hamsi website) / N/A	Fully autonomous		0.13 µm	50 kGates	3970 Mbit/s	820 MHz
JH-256	Tillich et al. / On request	Fully autonomous	320 S-boxes, one round of R₈ per cycle	UMC 0.18 µm	58.83 kGates	4991 Mbit/s	380.22 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Fully autonomous	Core (round function, state register) & IO buffer	ST 0.13 µm	48 kGates	29900 Mbit/s	526 MHz
Keccak	Submission document / Submission webpage	Fully autonomous	Core (round function, state register) only	ST 0.13 µm	40 kGates	15000 Mbit/s	500 MHz
Keccak(-256)	Tillich et al. / On request	Fully autonomous	One instance of Keccak-f round	UMC 0.18 µm	56.32 kGates	21229 Mbit/s	487.80 MHz
Luffa-224/256	Knežević and Verbauwhede / Author's webpage	Fully autonomous	Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each)	UMC 0.13 µm	30.83 kGates	31960 Mbit/s	1124 MHz
Luffa-256	Namin and Hasan / N/A	Core functionality	Compression function (1 cycle latency) and I/O registers	STM 90 nm	122 kGates	25702 Mbit/s(*)	100.4 MHz
Luffa-224/256	Tillich et al. / On request	Fully autonomous	Three permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each)	UMC 0.18 µm	44.97 kGates	13741 Mbit/s	483.09 MHz
Luffa-384	Knežević and Verbauwhede / Author's webpage	Fully autonomous	Four permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each)	UMC 0.13 µm	50.07 kGates	23126 Mbit/s	813 MHz
Luffa-512	Knežević and Verbauwhede / Author's webpage	Fully autonomous	Five permutation blocks in parallel (64 S-boxes, 4 MixWord blocks each)	UMC 0.13 µm	65.1 kGates	19617 Mbit/s	690 MHz
Shabal-256	Namin and Hasan / N/A	Core functionality	Compression function with I/O registers (latency of 16 clock cycles)	STM 90 nm	20 kGates	4408 Mbit/s(*)	413.22 MHz
Shabal-256	Tillich et al. / On request	Fully autonomous	One word rotation per cycle, 50 cycles per block	UMC 0.18 µm	54.19 kGates	3282 Mbit/s	320.51 MHz
SHAvite-3₂₅₆	Tillich et al. / On request	Fully autonomous	Four AES rounds (two for compression, two for message expansion)	UMC 0.18 µm	58.83 kGates	2387 Mbit/s	88.57 MHz
SIMD-256(**)	Tillich et al. / On request	Fully autonomous	Two FFT-64 with two FFT-8 and 16 multipliers (8x8 bit) each	UMC 0.18 µm	104.17 kGates	924 Mbit/s	64.93 MHz
Skein-256-256	Stefan Tillich / On request	Fully autonomous	8 Threefish rounds unrolled	UMC 0.18 µm	53.87 kGates	1762 Mbit/s	68.8 MHz
Skein-256-256	Namin and Hasan / N/A	Core functionality	All 72 Threefish rounds unrolled	STM 90 nm	369 kGates	3126 Mbit/s(*)	12.21 MHz
Skein-256-256	Tillich et al. / On request	Fully autonomous	8 Threefish rounds unrolled	UMC 0.18 µm	58.61 kGates	1882 Mbit/s	73.52 MHz
Skein-512-512	Tillich et al. / On request	Fully autonomous	8 Threefish rounds unrolled	UMC 0.18 µm	102.04 kGates	2502 Mbit/s	48.87 MHz

(*) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s (**) Implementation of round-one variant.

5 Low-Area Implementations (ASIC)

Hash Function Name	Reference / HDL	Impl. Scope	Implementation Details	Technology	Size	Throughput	Clock Frequency
BLAKE-32	Tillich et al. / N/A	Fully autonomous	One G function in 11 cycles	AMS 0.35 µm	25.57 kGates	15.4 Mbit/s	31.25 MHz
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with a single G function unit	UMC 0.18 µm	10.54 kGates	253 Mbit/s	40 MHz
BLAKE-32	Submission document / Submission webpage	Core functionality	Compression function with a half G function unit	UMC 0.18 µm	9.89 kGates	127 Mbit/s	40 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with a single G function unit	UMC 0.18 µm	20.61 kGates	181 Mbit/s	20 MHz
BLAKE-64	Submission document / Submission webpage	Core functionality	Compression function with a half G function unit	UMC 0.18 µm	19.46 kGates	91 Mbit/s	20 MHz
ECHO-224/256	Lu et al. / N/A	Fully autonomous		0.13 µm	82.8 kGates	373 Mbit/s	66.6 MHz
Fugue-256	Submission document / N/A	Fully autonomous	One SMIX transformation (SUPER1_L)	IBM 90 nm	59.22 kGates	2000 Mbit/s	500 MHz
Grøstl-224/256	Tillich et al. / N/A	Fully autonomous	64-bit datapath, P & Q permutation shared	AMS 0.35 µm	14.62 kGates	145.9 Mbit/s	55.87 MHz
Grøstl-224/256	Grøstl website / N/A	Fully autonomous	64-bit datapath, P & Q permutation shared	UMC 0.18 µm	17 kGates	645 Mbit/s	246.9 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Using external memory	Small core using system memory	ST 0.13 µm	6.5 kGates	176.4 Mbit/s(*)	666.7 MHz
Keccak	Updated specification (v1.2) / Submission webpage	Using external memory	Small core using system memory, clock freq. limited to 200 MHz	ST 0.13 µm	5 kGates	52.9 Mbit/s(**)	200 MHz
Luffa-224/256	Knežević and Verbauwhede / Author's webpage	Fully autonomous	One permutation block (64 S-boxes, 4 MixWord blocks)	UMC 0.13 µm	18.26 kGates	2461 Mbit/s	250 MHz
Luffa-384	Knežević and Verbauwhede / Author's webpage	Fully autonomous	One permutation block (64 S-boxes, 4 MixWord blocks)	UMC 0.13 µm	27.13 kGates	1882 Mbit/s	250 MHz
Luffa-512	Knežević and Verbauwhede / Author's webpage	Fully autonomous	One permutation block (64 S-boxes, 4 MixWord blocks)	UMC 0.13 µm	37.35 kGates	1524 Mbit/s	250 MHz
Skein-256-256	Tillich et al. / N/A	Fully autonomous	64-bit datapath	AMS 0.35 µm	12.89 kGates	19.8 Mbit/s	80 MHz
Skein-256-256	Namin and Hasan / N/A	Core functionality	One round of Threefish iterated	STM 90 nm	21 kGates	1018.8 Mbit/s(***)	286.53 MHz

(*) Estimation for 64-bit memory interface: (1024 bits/permutation) * (666.7 * 10^6 cycles/s) / (3870 cycles/permutation) = 176.41 * 10^6 bits/s
(**) Estimation for 64-bit memory interface: (1024 bits/permutation) * (200 * 10^6 cycles/s) / (3870 cycles/permutation) = 52.92 * 10^6 bits/s
(***) Estimated peak throughput for the minimal delay of compression function: 1000 * (Input Size in bits) / [(Compression Function Delay in ns) * (Number of Cycles)] = Throughput in Mbit/s

6 Call for Contributions

Implementers (both submitters and non-submitters): You have results that complement this site? Let us know at sha3zoo-hardware@iaik.tugraz.at If you are making your HDL code available, please also provide us with according information.

@@ Line 52: / Line 52: @@
 |-
 | BLAKE-32  || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with 8 G function units and I/O registers  || Altera Stratix III  || align="right"| 5435 ALUTs  || align="right"| 2186.2 Mbit/s  || align="right"| 46.97 MHz
+|-
+| BLAKE-32  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 1660 slices  || align="right"| 2676 Mbit/s  || align="right"| 115 MHz
 |-
 | BLAKE-64  || [http://131002.net/blake/blake.pdf Submission document] / [http://131002.net/blake/ Submission webpage]  || [[#Implementation_of_Core_Functionality|Core functionality]]  ||  Compression function with 8 G function units  || Xilinx Virtex-II Pro  || align="right"| 11122 slices  || align="right"| 1177 Mbit/s  || align="right"| 17.0 MHz
@@ Line 64: / Line 66: @@
 |-
 | CubeHash8/1-256 || [http://eprint.iacr.org/2009/342.pdf Baldwin et al.] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || 1 iterated compression function || Xilinx Virtex 5 || align="right"| 1178 slices  || align="right"| 160 Mbit/s  || align="right"| 166.8 MHz
+|-
+| CubeHash16/32-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 590 slices  || align="right"| 2960 Mbit/s  || align="right"| 185 MHz
 |-
 | ECHO-224/256  || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5  || align="right"| 9333 slices  || align="right"| 14860 Mbit/s  || align="right"| 87.1 MHz
+|-
+| ECHO-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 3556 slices  || align="right"| 1614 Mbit/s  || align="right"| 104 MHz
 |-
 | ECHO-384/512  || [http://www.ucc.ie/en/crypto/CodingandCryptographyWorkshop/TheClaudeShannonWorkshoponCodingCryptography2009/DocumentFile,75649,en.pdf Lu et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||   || Xilinx Virtex 5  || align="right"| 9097 slices  || align="right"| 7810 Mbit/s  || align="right"| 83.9 MHz
@@ Line 72: / Line 78: @@
 |-
 | Grøstl-224/256  || [http://www.groestl.info/Groestl.pdf Submission document] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Virtex 5  || align="right"| 1722 slices  || align="right"| 10276 Mbit/s  || align="right"| 200.7 MHz
+|-
+| Grøstl-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 4057 slices  || align="right"| 5171 Mbit/s  || align="right"| 101 MHz
 |-
 | Grøstl-384/512  || [http://www.groestl.info/Groestl.pdf Submission document] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Spartan 3  || align="right"| 20233 slices  || align="right"| 5901 Mbit/s  || align="right"| 80.7 MHz
@@ Line 78: / Line 86: @@
 |-
 | Grøstl-384/512  || [http://www.groestl.info/Groestl.pdf Submission document] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || P & Q permutation in parallel  || Xilinx Virtex 5  || align="right"| 5419 slices  || align="right"| 15395 Mbit/s  || align="right"| 210.5 MHz
+|-
+| Hamsi-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 718 slices  || align="right"| 1680 Mbit/s  || align="right"| 210 MHz
 |-
 | Keccak  || [http://keccak.noekeon.org/Keccak-main-1.2.pdf Updated specification (v1.2)] / [http://keccak.noekeon.org/ Submission webpage]  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || Core (round function, state register) & IO buffer || Altera Cyclone III || align="right"| 5776 LEs  || align="right"| 7500 Mbit/s || align="right"| 133 MHz
@@ Line 88: / Line 98: @@
 |-
 | Luffa-256  || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function (1 cycle latency) and I/O registers  || Altera Stratix III  || align="right"| 16552 ALUTs  || align="right"| 12042.2 Mbit/s  || align="right"| 47.04 MHz
+|-
+| Luffa-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 1048 slices  || align="right"| 6343 Mbit/s  || align="right"| 223 MHz
 |-
 | Shabal  || [http://ehash.iaik.tugraz.at/uploads/d/d4/FPGA_Implementation_of_Shabal_-_First_Results.pdf Feron and Francq] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  || 36 adders in permutation  || Xilinx Virtex 5  || align="right"| 1171 slices  || align="right"| 2588 Mbit/s  || align="right"| 126 MHz
@@ Line 96: / Line 108: @@
 |-
 | Shabal-256  || [http://www.vlsi.uwaterloo.ca/~ahasan/hasan_report.html Namin and Hasan] / N/A  || [[#Implementation_of_Core_Functionality|Core functionality]]  || Compression function with I/O registers (latency of 16 clock cycles)  || Altera Stratix III  || align="right"| 1440 ALUTs  || align="right"| 3125.6 Mbit/s  || align="right"| 195.35 MHz
+|-
+| Shabal-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 1251 slices  || align="right"| 1739 Mbit/s  || align="right"| 214 MHz
 |-
 | Skein-256 || [http://www.skein-hash.info/sites/default/files/skein_fpga.pdf Men Long] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || UBI component || Xilinx Virtex 5 || align="right"| 1001 slices  || align="right"| 408.7 Mbit/s || align="right"| 114.9 MHz
@@ Line 102: / Line 116: @@
 |-
 | Skein-256 || [http://eprint.iacr.org/2009/159.pdf Stefan Tillich] / [mailto:stillich@iaik.tugraz.at On request] || [[#Fully_Autonomous_Implementation|Fully autonomous]] || 8 Threefish rounds unrolled || Xilinx Spartan 3 || align="right"| 2421 slices  || align="right"| 669 Mbit/s || align="right"| 26.14 MHz
+|-
+| Skein-256  || [http://eprint.iacr.org/2010/010.pdf Kobayashi et al.] / N/A  || [[#Fully_Autonomous_Implementation|Fully autonomous]]  ||    || Xilinx Virtex 5  || align="right"| 854 slices  || align="right"| 1482 Mbit/s  || align="right"| 115 MHz
 |-
 | Skein-512 || [http://www.skein-hash.info/sites/default/files/skein_fpga.pdf Men Long] / N/A || [[#Implementation_of_Core_Functionality|Core functionality]] || UBI component || Xilinx Virtex 5 || align="right"| 1877 slices  || align="right"| 817.4 Mbit/s || align="right"| 114.9 MHz

Difference between revisions of "SHA-3 Hardware Implementations"

Revision as of 14:44, 17 February 2010

Contents

1 Important Information

1.1 Fully Autonomous Implementation

1.2 Implementation with External Memory

1.3 Implementation of Core Functionality

2 High-Speed Implementations (FPGA)

3 Low-Area Implementations (FPGA)

4 High-Speed Implementations (ASIC)

5 Low-Area Implementations (ASIC)

6 Call for Contributions

Navigation menu

Views

Personal tools

Navigation

Search

Tools