Skip to content

Conversation

manastasova
Copy link
Contributor

@manastasova manastasova commented Aug 15, 2025

Description of changes:

This PR is a prototype of the x86 Keccak code as part of the third party module. Once the code and the proof are merged into s2n-bignum, the s2n-bignum importer script will be used to integrate the implementation.

Testing:

ninja && ./crypto/crypto_test
./tool/bssl speed -filter {SHA3-224, ...}

SHA3 Performance: Assembly vs C Implementation Tables

SHA3-224

Input Size Assembly (MB/s) C (MB/s) Speedup Improvement (%)
16 bytes 54.4 45.2 1.20x 20.4%
256 bytes 445.5 390.9 1.14x 14.0%
1350 bytes 492.0 428.3 1.15x 14.9%
8192 bytes 527.2 449.3 1.17x 17.3%
16384 bytes 533.5 451.1 1.18x 18.3%

SHA3-256

Input Size Assembly (MB/s) C (MB/s) Speedup Improvement (%)
16 bytes 54.4 45.9 1.19x 18.5%
256 bytes 446.6 400.5 1.12x 11.5%
1350 bytes 493.1 433.9 1.14x 13.6%
8192 bytes 496.8 428.6 1.16x 15.9%
16384 bytes 498.1 431.1 1.16x 15.5%

SHA3-384

Input Size Assembly (MB/s) C (MB/s) Speedup Improvement (%)
16 bytes 54.3 48.0 1.13x 13.1%
256 bytes 305.8 283.6 1.08x 7.8%
1350 bytes 384.4 333.7 1.15x 15.2%
8192 bytes 384.5 328.8 1.17x 17.0%
16384 bytes 385.7 329.4 1.17x 17.1%

SHA3-512

Input Size Assembly (MB/s) C (MB/s) Speedup Improvement (%)
16 bytes 54.0 46.4 1.16x 16.4%
256 bytes 234.5 210.5 1.11x 11.4%
1350 bytes 266.8 228.2 1.17x 16.9%
8192 bytes 272.3 229.2 1.19x 18.8%
16384 bytes 271.1 231.6 1.17x 17.1%

SHA3 Performance: Details

ASM Implementation

./tool/bssl speed -filter SHA3-224
Did 3399750 SHA3-224 (16 bytes) operations in 1000077us (3399488.2 ops/sec): 54.4 MB/s
Did 1741000 SHA3-224 (256 bytes) operations in 1000447us (1740222.1 ops/sec): 445.5 MB/s
Did 365000 SHA3-224 (1350 bytes) operations in 1001528us (364443.1 ops/sec): 492.0 MB/s
Did 65000 SHA3-224 (8192 bytes) operations in 1009980us (64357.7 ops/sec): 527.2 MB/s
Did 33000 SHA3-224 (16384 bytes) operations in 1013407us (32563.4 ops/sec): 533.5 MB/s
./tool/bssl speed -filter SHA3-256
Did 3403250 SHA3-256 (16 bytes) operations in 1000045us (3403096.9 ops/sec): 54.4 MB/s
Did 1744750 SHA3-256 (256 bytes) operations in 1000020us (1744715.1 ops/sec): 446.6 MB/s
Did 366000 SHA3-256 (1350 bytes) operations in 1001951us (365287.3 ops/sec): 493.1 MB/s
Did 61000 SHA3-256 (8192 bytes) operations in 1005814us (60647.4 ops/sec): 496.8 MB/s
Did 31000 SHA3-256 (16384 bytes) operations in 1019776us (30398.8 ops/sec): 498.1 MB/s
./tool/bssl speed -filter SHA3-384
Did 3395000 SHA3-384 (16 bytes) operations in 1000226us (3394232.9 ops/sec): 54.3 MB/s
Did 1195000 SHA3-384 (256 bytes) operations in 1000425us (1194492.3 ops/sec): 305.8 MB/s
Did 285000 SHA3-384 (1350 bytes) operations in 1000955us (284728.1 ops/sec): 384.4 MB/s
Did 47000 SHA3-384 (8192 bytes) operations in 1001271us (46940.3 ops/sec): 384.5 MB/s
Did 24000 SHA3-384 (16384 bytes) operations in 1019448us (23542.2 ops/sec): 385.7 MB/s
./tool/bssl speed -filter SHA3-512
Did 3377000 SHA3-512 (16 bytes) operations in 1000075us (3376746.7 ops/sec): 54.0 MB/s
Did 917000 SHA3-512 (256 bytes) operations in 1000998us (916085.7 ops/sec): 234.5 MB/s
Did 198000 SHA3-512 (1350 bytes) operations in 1001963us (197612.1 ops/sec): 266.8 MB/s
Did 34000 SHA3-512 (8192 bytes) operations in 1022690us (33245.7 ops/sec): 272.3 MB/s
Did 17000 SHA3-512 (16384 bytes) operations in 1027485us (16545.3 ops/sec): 271.1 MB/s

C Implementation

./tool/bssl speed -filter SHA3-224
Did 2827000 SHA3-224 (16 bytes) operations in 1000051us (2826855.8 ops/sec): 45.2 MB/s
Did 1528000 SHA3-224 (256 bytes) operations in 1000630us (1527038.0 ops/sec): 390.9 MB/s
Did 318000 SHA3-224 (1350 bytes) operations in 1002449us (317223.1 ops/sec): 428.3 MB/s
Did 55000 SHA3-224 (8192 bytes) operations in 1002827us (54845.0 ops/sec): 449.3 MB/s
Did 28000 SHA3-224 (16384 bytes) operations in 1016880us (27535.2 ops/sec): 451.1 MB/s
./tool/bssl speed -filter SHA3-256
Did 2867500 SHA3-256 (16 bytes) operations in 1000073us (2867290.7 ops/sec): 45.9 MB/s
Did 1564750 SHA3-256 (256 bytes) operations in 1000143us (1564526.3 ops/sec): 400.5 MB/s
Did 322000 SHA3-256 (1350 bytes) operations in 1001788us (321425.3 ops/sec): 433.9 MB/s
Did 53000 SHA3-256 (8192 bytes) operations in 1012940us (52322.9 ops/sec): 428.6 MB/s
Did 27000 SHA3-256 (16384 bytes) operations in 1026228us (26309.9 ops/sec): 431.1 MB/s
./tool/bssl speed -filter SHA3-384
Did 3000000 SHA3-384 (16 bytes) operations in 1000366us (2998902.4 ops/sec): 48.0 MB/s
Did 1108000 SHA3-384 (256 bytes) operations in 1000065us (1107928.0 ops/sec): 283.6 MB/s
Did 248000 SHA3-384 (1350 bytes) operations in 1003285us (247188.0 ops/sec): 333.7 MB/s
Did 41000 SHA3-384 (8192 bytes) operations in 1021609us (40132.8 ops/sec): 328.8 MB/s
Did 21000 SHA3-384 (16384 bytes) operations in 1044409us (20107.1 ops/sec): 329.4 MB/s
./tool/bssl speed -filter SHA3-512
Did 2902000 SHA3-512 (16 bytes) operations in 1000219us (2901364.6 ops/sec): 46.4 MB/s
Did 823000 SHA3-512 (256 bytes) operations in 1000766us (822370.1 ops/sec): 210.5 MB/s
Did 170000 SHA3-512 (1350 bytes) operations in 1005799us (169019.9 ops/sec): 228.2 MB/s
Did 28000 SHA3-512 (8192 bytes) operations in 1000970us (27972.9 ops/sec): 229.2 MB/s
Did 15000 SHA3-512 (16384 bytes) operations in 1061183us (14135.2 ops/sec): 231.6 MB/s

###MLKEM Performance: Assembly vs C SHA3 Tables

ML-KEM Performance: Assembly vs C Implementation

ML-KEM-512

Operation Assembly (ops/sec) C (ops/sec) Speedup Improvement (%)
Keygen 59902.7 57511.2 1.04x 4.2%
Encaps 55389.8 52286.3 1.06x 5.9%
Decaps 45758.1 42746.4 1.07x 7.0%

ML-KEM-768

Operation Assembly (ops/sec) C (ops/sec) Speedup Improvement (%)
Keygen 35753.8 34474.1 1.04x 3.7%
Encaps 36155.6 34309.8 1.05x 5.4%
Decaps 30390.9 28625.7 1.06x 6.2%

ML-KEM-1024

Operation Assembly (ops/sec) C (ops/sec) Speedup Improvement (%)
Keygen 23652.7 22711.6 1.04x 4.1%
Encaps 25448.2 23889.1 1.07x 6.5%
Decaps 21344.2 19922.1 1.07x 7.1%

ASM Implementation

./tool/bssl speed -filter ML-KEM-512
Did 60000 ML-KEM-512 keygen operations in 1001625us (59902.7 ops/sec)
Did 56000 ML-KEM-512 encaps operations in 1011016us (55389.8 ops/sec)
Did 46000 ML-KEM-512 decaps operations in 1005287us (45758.1 ops/sec)
./tool/bssl speed -filter ML-KEM-768
Did 36000 ML-KEM-768 keygen operations in 1006886us (35753.8 ops/sec)
Did 37000 ML-KEM-768 encaps operations in 1023355us (36155.6 ops/sec)
Did 31000 ML-KEM-768 decaps operations in 1020042us (30390.9 ops/sec)
./tool/bssl speed -filter ML-KEM-1024
Did 24000 ML-KEM-1024 keygen operations in 1014683us (23652.7 ops/sec)
Did 26000 ML-KEM-1024 encaps operations in 1021685us (25448.2 ops/sec)
Did 22000 ML-KEM-1024 decaps operations in 1030726us (21344.2 ops/sec)

C Implementation

./tool/bssl speed -filter ML-KEM-512
Did 58000 ML-KEM-512 keygen operations in 1008500us (57511.2 ops/sec)
Did 53000 ML-KEM-512 encaps operations in 1013649us (52286.3 ops/sec)
Did 43000 ML-KEM-512 decaps operations in 1005932us (42746.4 ops/sec)
./tool/bssl speed -filter ML-KEM-768
Did 35000 ML-KEM-768 keygen operations in 1015254us (34474.1 ops/sec)
Did 35000 ML-KEM-768 encaps operations in 1020117us (34309.8 ops/sec)
Did 29000 ML-KEM-768 decaps operations in 1013076us (28625.7 ops/sec)
./tool/bssl speed -filter ML-KEM-1024
Did 23000 ML-KEM-1024 keygen operations in 1012698us (22711.6 ops/sec)
Did 24000 ML-KEM-1024 encaps operations in 1004641us (23889.1 ops/sec)
Did 20000 ML-KEM-1024 decaps operations in 1003908us (19922.1 ops/sec)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@codecov-commenter
Copy link

codecov-commenter commented Aug 16, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.72%. Comparing base (04875db) to head (c7f9542).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2619      +/-   ##
==========================================
- Coverage   78.72%   78.72%   -0.01%     
==========================================
  Files         645      646       +1     
  Lines      111086   111216     +130     
  Branches    15690    15711      +21     
==========================================
+ Hits        87453    87550      +97     
- Misses      22941    22975      +34     
+ Partials      692      691       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andrewhop pushed a commit that referenced this pull request Aug 25, 2025
Bit Interleave is used for performance optimizations on 32-bit
platforms. Bit Interleave adds unnecessary complexity.

### Issues:
Some Windows compiler, e.g., old versions of Microsoft Visual C++
(MSVC), do not support some preprocessor directives and expressions,
e.g., of the type:

```
// Double-check that bit-interleaving is not used on AArch64
#if BIT_INTERLEAVE != 0
#error Bit-interleaving of Keccak1600 states should be disabled for AArch64
#endif
```

in
https://github.com/aws/aws-lc/blob/d781046a99638d1466ec912cf0191d0564de2084/crypto/fipsmodule/sha/keccak1600.c#L422

A solution could be:

```
#if defined(BIT_INTERLEAVE) && BIT_INTERLEAVE
  #error Bit-interleaving of Keccak1600 states should be disabled for AArch64
#endif
```

However, BIT_INTERLEAVE is intended for only optimizing 32-bit
platforms, i.e., it adds unnecessary complexity to the code without
providing many benefits.

Therefore, removing BIT_INTERLEAVE support is the better solution for
clarity and maintainability.


### Description of changes: 
Remove all support for BIT_INTERLEAVE.

### Call-outs:
This change is needed/motivated by the integration of x86 Keccak to
aws-lc #2619 which fails when running
on x86 Windows platform.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 license and the ISC license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants