Skip to content

perf(levm): add AVX256 implementation of BLAKE2 #3590

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 17, 2025
Merged

Conversation

iovoid
Copy link
Contributor

@iovoid iovoid commented Jul 10, 2025

Motivation

To improve BLAKE2 performance.

Description

Why AVX256 instead of AVX512? Mainly that AVX512 intrinsics are still experimental.

Creates a common/crypto module to house blake2. We should consider moving here other cryptographic operations currently inside precompiles.rs.

If avx2 is available, a permute-with-gather implementation is used.

Usage of unsafe is required for SIMD loads and stores. It should be reviewed that alignment requirements are satisfied and that no out-of-bounds operations are possible.

Note that aside from the obvious ones with "load" or "store" in the name, gather also represents a series of memory loads.

Unsafe is also required to call the first avx2-enabled function, since we must first ensure avx2 is actually available on the target CPU.

** Benchmarks **

PR

Title Max (MGas/s) p50 (MGas/s) p95 (MGas/s) p99 (MGas/s) Min (MGas/s)
Blake1MRounds 120.19 93.97 93.38 99.85 91.54
Blake1Round 226.42 175.09 170.08 166.83 166.82
Blake1KRounds 122.36 97.28 96.09 100.90 95.87
Blake10MRounds 174.36 110.78 104.15 124.33 103.89

Main

Title Max (MGas/s) p50 (MGas/s) p95 (MGas/s) p99 (MGas/s) Min (MGas/s)
Blake1MRounds 80.79 63.04 62.57 67.80 62.50
Blake1Round 223.59 174.93 168.21 159.38 159.33
Blake1KRounds 83.75 66.59 65.88 68.37 64.76
Blake10MRounds 117.79 77.21 69.63 83.19 69.05

Copy link

github-actions bot commented Jul 10, 2025

Lines of code report

Total lines added: 297
Total lines removed: 79
Total lines changed: 376

Detailed view
+-------------------------------------------------+-------+------+
| File                                            | Lines | Diff |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/crypto/blake2f/avx.rs      | 169   | +169 |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/crypto/blake2f/mod.rs      | 21    | +21  |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/crypto/blake2f/portable.rs | 106   | +106 |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/crypto/lib.rs              | 1     | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/precompiles.rs        | 1007  | -79  |
+-------------------------------------------------+-------+------+

Copy link

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.190 ± 0.023 3.168 3.239 1.00
main_levm_BubbleSort 4.477 ± 0.081 4.405 4.597 1.40 ± 0.03
pr_revm_BubbleSort 3.224 ± 0.017 3.202 3.252 1.01 ± 0.01
pr_levm_BubbleSort 4.489 ± 0.032 4.466 4.575 1.41 ± 0.01

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.036 ± 0.008 1.025 1.050 1.01 ± 0.01
main_levm_ERC20Approval 1.536 ± 0.017 1.516 1.581 1.49 ± 0.02
pr_revm_ERC20Approval 1.028 ± 0.011 1.017 1.048 1.00
pr_levm_ERC20Approval 1.536 ± 0.005 1.528 1.543 1.49 ± 0.02

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 136.9 ± 0.7 136.2 138.2 1.00 ± 0.01
main_levm_ERC20Mint 257.0 ± 19.7 249.0 313.0 1.88 ± 0.14
pr_revm_ERC20Mint 136.8 ± 0.6 135.9 138.0 1.00
pr_levm_ERC20Mint 256.8 ± 2.9 254.1 264.4 1.88 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 246.0 ± 11.0 239.2 270.6 1.03 ± 0.05
main_levm_ERC20Transfer 394.3 ± 2.5 392.1 400.6 1.65 ± 0.01
pr_revm_ERC20Transfer 238.8 ± 1.1 237.8 241.2 1.00
pr_levm_ERC20Transfer 405.6 ± 3.4 402.1 414.0 1.70 ± 0.02

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 232.0 ± 0.6 231.2 233.2 1.00
main_levm_Factorial 486.8 ± 24.5 473.8 534.8 2.10 ± 0.11
pr_revm_Factorial 234.3 ± 0.3 233.6 234.6 1.01 ± 0.00
pr_levm_Factorial 477.4 ± 0.7 476.3 478.3 2.06 ± 0.01

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.648 ± 0.019 1.618 1.672 1.01 ± 0.04
main_levm_FactorialRecursive 2.879 ± 0.060 2.828 3.012 1.77 ± 0.07
pr_revm_FactorialRecursive 1.625 ± 0.058 1.465 1.663 1.00
pr_levm_FactorialRecursive 2.783 ± 0.025 2.740 2.834 1.71 ± 0.06

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 206.2 ± 1.5 202.8 207.9 1.00
main_levm_Fibonacci 465.1 ± 2.7 462.7 472.1 2.26 ± 0.02
pr_revm_Fibonacci 209.0 ± 0.7 207.3 209.9 1.01 ± 0.01
pr_levm_Fibonacci 467.2 ± 5.2 463.9 480.5 2.27 ± 0.03

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 887.4 ± 15.0 868.0 916.8 1.00 ± 0.02
main_levm_FibonacciRecursive 1513.5 ± 34.5 1481.0 1564.6 1.71 ± 0.05
pr_revm_FibonacciRecursive 883.7 ± 15.4 848.8 907.2 1.00
pr_levm_FibonacciRecursive 1442.7 ± 40.6 1416.9 1555.9 1.63 ± 0.05

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.7 ± 0.1 8.6 8.8 1.00
main_levm_ManyHashes 13.3 ± 0.1 13.2 13.4 1.53 ± 0.01
pr_revm_ManyHashes 8.8 ± 0.1 8.7 8.9 1.01 ± 0.01
pr_levm_ManyHashes 13.4 ± 0.1 13.2 13.6 1.53 ± 0.02

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 268.3 ± 3.0 265.9 276.0 1.00 ± 0.01
main_levm_MstoreBench 941.3 ± 3.7 936.4 948.4 3.51 ± 0.02
pr_revm_MstoreBench 268.1 ± 1.5 266.3 270.6 1.00
pr_levm_MstoreBench 936.1 ± 4.0 931.4 942.8 3.49 ± 0.02

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 297.9 ± 1.7 295.9 300.6 1.00
main_levm_Push 1058.5 ± 8.7 1054.1 1082.9 3.55 ± 0.04
pr_revm_Push 302.6 ± 3.0 300.4 310.7 1.02 ± 0.01
pr_levm_Push 1045.4 ± 3.2 1040.7 1052.4 3.51 ± 0.02

Copy link

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.180 ± 0.019 3.155 3.207 1.00
main_levm_BubbleSort 4.552 ± 0.145 4.401 4.868 1.43 ± 0.05
pr_revm_BubbleSort 3.242 ± 0.018 3.220 3.277 1.02 ± 0.01
pr_levm_BubbleSort 4.454 ± 0.027 4.432 4.509 1.40 ± 0.01

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.020 ± 0.004 1.016 1.030 1.00
main_levm_ERC20Approval 1.532 ± 0.010 1.519 1.547 1.50 ± 0.01
pr_revm_ERC20Approval 1.058 ± 0.010 1.049 1.076 1.04 ± 0.01
pr_levm_ERC20Approval 1.541 ± 0.020 1.523 1.585 1.51 ± 0.02

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 136.9 ± 1.3 135.8 140.3 1.00
main_levm_ERC20Mint 251.5 ± 7.6 246.1 267.7 1.84 ± 0.06
pr_revm_ERC20Mint 140.4 ± 0.7 139.5 142.0 1.03 ± 0.01
pr_levm_ERC20Mint 249.6 ± 2.6 246.9 254.9 1.82 ± 0.03

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 240.1 ± 1.3 238.7 242.5 1.00
main_levm_ERC20Transfer 394.6 ± 2.0 392.0 397.7 1.64 ± 0.01
pr_revm_ERC20Transfer 246.9 ± 2.2 245.1 252.3 1.03 ± 0.01
pr_levm_ERC20Transfer 395.6 ± 4.4 391.5 403.6 1.65 ± 0.02

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 230.8 ± 1.2 229.9 234.0 1.00
main_levm_Factorial 489.2 ± 23.4 476.0 534.5 2.12 ± 0.10
pr_revm_Factorial 233.1 ± 2.5 230.6 238.8 1.01 ± 0.01
pr_levm_Factorial 478.4 ± 3.9 475.0 488.8 2.07 ± 0.02

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.620 ± 0.020 1.595 1.649 1.01 ± 0.02
main_levm_FactorialRecursive 2.862 ± 0.064 2.775 2.942 1.78 ± 0.05
pr_revm_FactorialRecursive 1.604 ± 0.025 1.562 1.646 1.00
pr_levm_FactorialRecursive 2.781 ± 0.031 2.724 2.826 1.73 ± 0.03

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 205.5 ± 0.7 204.7 206.8 1.00
main_levm_Fibonacci 469.6 ± 6.5 463.6 481.4 2.29 ± 0.03
pr_revm_Fibonacci 207.4 ± 1.1 205.7 209.0 1.01 ± 0.01
pr_levm_Fibonacci 468.1 ± 6.2 462.3 483.9 2.28 ± 0.03

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 860.3 ± 14.1 830.9 880.6 1.00
main_levm_FibonacciRecursive 1468.7 ± 25.9 1448.2 1539.3 1.71 ± 0.04
pr_revm_FibonacciRecursive 869.8 ± 13.8 848.5 888.2 1.01 ± 0.02
pr_levm_FibonacciRecursive 1451.8 ± 9.2 1443.1 1468.8 1.69 ± 0.03

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.6 ± 0.1 8.6 8.7 1.00
main_levm_ManyHashes 13.7 ± 0.1 13.5 13.8 1.59 ± 0.02
pr_revm_ManyHashes 8.8 ± 0.0 8.8 8.8 1.02 ± 0.01
pr_levm_ManyHashes 13.4 ± 0.1 13.3 13.5 1.55 ± 0.01

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 274.3 ± 4.1 268.7 280.8 1.00
main_levm_MstoreBench 939.1 ± 5.7 931.5 950.4 3.42 ± 0.06
pr_revm_MstoreBench 282.9 ± 3.5 276.7 287.2 1.03 ± 0.02
pr_levm_MstoreBench 941.4 ± 4.8 936.3 951.0 3.43 ± 0.05

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 300.5 ± 2.4 297.7 303.6 1.01 ± 0.01
main_levm_Push 1053.6 ± 7.4 1047.6 1073.5 3.54 ± 0.03
pr_revm_Push 297.5 ± 1.2 296.0 300.4 1.00
pr_levm_Push 1061.8 ± 6.9 1049.2 1071.4 3.57 ± 0.03

Copy link

Benchmark for ea7c34b

Click to view benchmark
Test Base PR %
block payload building bench 0.2±0.00ns 0.2±0.00ns 0.00%

@iovoid iovoid changed the title perf(levm): AVX256 implementation of blake2 perf(levm): add AVX256 implementation of BLAKE2 Jul 10, 2025
@iovoid iovoid added the levm Lambda EVM implementation label Jul 10, 2025
@iovoid iovoid marked this pull request as ready for review July 10, 2025 18:45
@iovoid iovoid requested a review from a team as a code owner July 10, 2025 18:45
Copy link

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.192 ± 0.023 3.166 3.242 1.00
main_levm_BubbleSort 4.427 ± 0.050 4.396 4.567 1.39 ± 0.02
pr_revm_BubbleSort 3.233 ± 0.017 3.194 3.262 1.01 ± 0.01
pr_levm_BubbleSort 4.502 ± 0.169 4.422 4.980 1.41 ± 0.05

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.035 ± 0.008 1.029 1.056 1.00
main_levm_ERC20Approval 1.528 ± 0.038 1.506 1.630 1.48 ± 0.04
pr_revm_ERC20Approval 1.048 ± 0.009 1.038 1.069 1.01 ± 0.01
pr_levm_ERC20Approval 1.524 ± 0.016 1.511 1.556 1.47 ± 0.02

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 137.7 ± 1.1 136.9 140.6 1.00
main_levm_ERC20Mint 249.4 ± 5.8 243.9 263.1 1.81 ± 0.04
pr_revm_ERC20Mint 138.4 ± 0.5 137.8 139.3 1.00 ± 0.01
pr_levm_ERC20Mint 247.8 ± 0.6 247.0 248.4 1.80 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 243.7 ± 2.6 242.2 250.8 1.00
main_levm_ERC20Transfer 391.8 ± 3.7 389.0 400.5 1.61 ± 0.02
pr_revm_ERC20Transfer 244.0 ± 3.9 241.1 254.7 1.00 ± 0.02
pr_levm_ERC20Transfer 392.1 ± 3.1 387.6 398.2 1.61 ± 0.02

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 235.2 ± 5.2 232.3 249.5 1.00 ± 0.02
main_levm_Factorial 497.1 ± 27.4 474.1 529.4 2.12 ± 0.12
pr_revm_Factorial 234.6 ± 1.4 233.4 236.8 1.00
pr_levm_Factorial 478.4 ± 7.6 475.1 499.9 2.04 ± 0.03

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.601 ± 0.025 1.568 1.657 1.00 ± 0.03
main_levm_FactorialRecursive 2.773 ± 0.060 2.714 2.889 1.73 ± 0.05
pr_revm_FactorialRecursive 1.601 ± 0.033 1.542 1.651 1.00
pr_levm_FactorialRecursive 2.753 ± 0.018 2.731 2.786 1.72 ± 0.04

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 207.1 ± 1.3 206.1 210.8 1.00
main_levm_Fibonacci 474.3 ± 36.5 461.3 578.1 2.29 ± 0.18
pr_revm_Fibonacci 216.0 ± 2.8 214.0 223.2 1.04 ± 0.01
pr_levm_Fibonacci 468.4 ± 6.9 463.5 481.4 2.26 ± 0.04

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 866.2 ± 5.7 858.7 874.5 1.01 ± 0.01
main_levm_FibonacciRecursive 1465.6 ± 36.5 1433.8 1534.2 1.70 ± 0.05
pr_revm_FibonacciRecursive 861.0 ± 10.4 844.8 875.2 1.00
pr_levm_FibonacciRecursive 1450.4 ± 7.2 1439.4 1465.9 1.68 ± 0.02

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.7 ± 0.1 8.6 8.8 1.00 ± 0.01
main_levm_ManyHashes 13.3 ± 0.1 13.2 13.5 1.53 ± 0.01
pr_revm_ManyHashes 8.7 ± 0.0 8.7 8.8 1.00
pr_levm_ManyHashes 13.4 ± 0.1 13.3 13.8 1.54 ± 0.02

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 311.8 ± 132.1 266.7 687.6 1.16 ± 0.49
main_levm_MstoreBench 939.0 ± 3.6 933.8 947.3 3.50 ± 0.02
pr_revm_MstoreBench 268.1 ± 1.2 266.9 270.2 1.00
pr_levm_MstoreBench 943.2 ± 6.5 933.3 955.0 3.52 ± 0.03

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 294.4 ± 0.9 292.6 296.0 1.00 ± 0.00
main_levm_Push 1072.8 ± 77.8 1044.1 1293.8 3.65 ± 0.26
pr_revm_Push 293.9 ± 0.8 292.3 294.8 1.00
pr_levm_Push 1053.8 ± 4.3 1048.2 1062.2 3.59 ± 0.02

@iovoid iovoid moved this to In Review in ethrex_l1 Jul 10, 2025
@iovoid iovoid moved this from Todo to In review in ethrex_performance Jul 10, 2025
Copy link

Benchmark for 0544811

Click to view benchmark
Test Base PR %
block payload building bench 0.2±0.00ns 0.2±0.00ns 0.00%

Copy link

github-actions bot commented Jul 10, 2025

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 212.212 ± 0.969 210.866 214.028 1.00 ± 0.01
head 212.043 ± 0.993 210.946 213.913 1.00

Copy link

github-actions bot commented Jul 16, 2025

Benchmark Results Comparison

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.257 ± 0.016 3.230 3.285 1.01 ± 0.01
main_levm_BubbleSort 4.413 ± 0.024 4.387 4.471 1.37 ± 0.01
pr_revm_BubbleSort 3.230 ± 0.012 3.220 3.254 1.00
pr_levm_BubbleSort 4.418 ± 0.022 4.393 4.475 1.37 ± 0.01

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.046 ± 0.006 1.036 1.056 1.01 ± 0.01
main_levm_ERC20Approval 1.514 ± 0.018 1.499 1.552 1.47 ± 0.02
pr_revm_ERC20Approval 1.033 ± 0.003 1.028 1.038 1.00
pr_levm_ERC20Approval 1.520 ± 0.010 1.506 1.538 1.47 ± 0.01

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 139.6 ± 1.4 138.3 143.2 1.01 ± 0.01
main_levm_ERC20Mint 244.2 ± 2.8 240.5 247.4 1.77 ± 0.02
pr_revm_ERC20Mint 138.2 ± 0.8 137.0 139.2 1.00
pr_levm_ERC20Mint 248.7 ± 3.1 245.4 256.2 1.80 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 245.4 ± 0.8 244.1 246.8 1.00
main_levm_ERC20Transfer 389.1 ± 2.7 386.1 393.4 1.59 ± 0.01
pr_revm_ERC20Transfer 245.8 ± 3.5 242.5 254.5 1.00 ± 0.01
pr_levm_ERC20Transfer 392.2 ± 2.6 389.1 395.4 1.60 ± 0.01

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 238.6 ± 1.1 237.2 241.1 1.00
main_levm_Factorial 458.8 ± 1.1 457.0 460.5 1.92 ± 0.01
pr_revm_Factorial 240.9 ± 2.4 238.7 247.0 1.01 ± 0.01
pr_levm_Factorial 461.8 ± 2.9 459.5 469.4 1.94 ± 0.01

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.572 ± 0.093 1.316 1.642 1.00
main_levm_FactorialRecursive 2.677 ± 0.021 2.646 2.708 1.70 ± 0.10
pr_revm_FactorialRecursive 1.604 ± 0.026 1.552 1.644 1.02 ± 0.06
pr_levm_FactorialRecursive 2.764 ± 0.020 2.738 2.805 1.76 ± 0.10

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 212.2 ± 0.5 211.5 213.0 1.00
main_levm_Fibonacci 446.6 ± 1.4 444.7 449.4 2.10 ± 0.01
pr_revm_Fibonacci 212.7 ± 0.7 211.5 214.0 1.00 ± 0.00
pr_levm_Fibonacci 450.7 ± 9.3 446.1 477.0 2.12 ± 0.04

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 853.7 ± 12.4 837.2 875.0 1.00
main_levm_FibonacciRecursive 1404.7 ± 10.5 1390.2 1422.3 1.65 ± 0.03
pr_revm_FibonacciRecursive 857.6 ± 22.0 832.4 911.9 1.00 ± 0.03
pr_levm_FibonacciRecursive 1461.6 ± 8.1 1451.6 1477.6 1.71 ± 0.03

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.8 ± 0.1 8.7 9.0 1.01 ± 0.01
main_levm_ManyHashes 13.1 ± 0.1 13.0 13.4 1.49 ± 0.01
pr_revm_ManyHashes 8.7 ± 0.0 8.7 8.8 1.00
pr_levm_ManyHashes 14.0 ± 0.8 13.6 16.2 1.60 ± 0.09

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 276.5 ± 2.1 274.5 280.2 1.00
main_levm_MstoreBench 936.2 ± 2.3 932.3 939.9 3.39 ± 0.03
pr_revm_MstoreBench 276.7 ± 1.1 275.6 278.9 1.00 ± 0.01
pr_levm_MstoreBench 943.1 ± 6.4 936.3 958.9 3.41 ± 0.03

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 293.8 ± 1.4 292.2 296.7 1.00
main_levm_Push 1052.0 ± 3.8 1046.8 1057.9 3.58 ± 0.02
pr_revm_Push 294.8 ± 2.3 292.9 300.7 1.00 ± 0.01
pr_levm_Push 1060.6 ± 5.9 1056.3 1076.1 3.61 ± 0.03

Benchmark Results: SstoreBench_no_opt

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_SstoreBench_no_opt 158.8 ± 0.7 157.6 159.7 1.00
main_levm_SstoreBench_no_opt 170.5 ± 2.1 168.8 174.0 1.07 ± 0.01
pr_revm_SstoreBench_no_opt 161.2 ± 5.2 157.5 170.8 1.02 ± 0.03
pr_levm_SstoreBench_no_opt 171.1 ± 1.9 168.5 173.9 1.08 ± 0.01

Copy link
Collaborator

@Arkenan Arkenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edg-l
Copy link
Contributor

edg-l commented Jul 17, 2025

Looks good from my side, additionally i ran this code under miri to check if it found anything, adding some more tests and it ran ok.

RUSTFLAGS="-Zrandomize-layout -C target-cpu=x86-64-v4 -C target-feature=+avx2" cargo miri test

@iovoid iovoid added this pull request to the merge queue Jul 17, 2025
Merged via the queue into main with commit 165b94c Jul 17, 2025
49 checks passed
@iovoid iovoid deleted the perf/blake2-avx256 branch July 17, 2025 14:49
@github-project-automation github-project-automation bot moved this from In review to Done in ethrex_performance Jul 17, 2025
@github-project-automation github-project-automation bot moved this from In Review to Done in ethrex_l1 Jul 17, 2025
d-roak pushed a commit to 1sixtech/ethrex that referenced this pull request Jul 17, 2025
**Motivation**

To improve BLAKE2 performance.

**Description**

Why AVX256 instead of AVX512? Mainly that
[AVX512](rust-lang/rust#111137) intrinsics are
still experimental.

Creates a common/crypto module to house blake2. We should consider
moving here other cryptographic operations currently inside
precompiles.rs.

If avx2 is available, a permute-with-gather implementation is used.

Usage of unsafe is required for SIMD loads and stores. It should be
reviewed that alignment requirements are satisfied and that no
out-of-bounds operations are possible.

Note that aside from the obvious ones with "load" or "store" in the
name, gather also represents a series of memory loads.

Unsafe is also required to call the first avx2-enabled function, since
we must first ensure avx2 is actually available on the target CPU.

** Benchmarks **

### PR

|Title|Max (MGas/s)|p50 (MGas/s)|p95 (MGas/s)|p99 (MGas/s)|Min (MGas/s)|

|----|--------------|--------------|-------------|--------------|--------------|
Blake1MRounds|120.19|93.97|93.38|99.85|91.54
Blake1Round|226.42|175.09|170.08|166.83|166.82
Blake1KRounds|122.36|97.28|96.09|100.90|95.87
Blake10MRounds|174.36|110.78|104.15|124.33|103.89

### Main

|Title|Max (MGas/s)|p50 (MGas/s)|p95 (MGas/s)|p99 (MGas/s)|Min (MGas/s)|

|----|--------------|--------------|-------------|--------------|--------------|
Blake1MRounds|80.79|63.04|62.57|67.80|62.50
Blake1Round|223.59|174.93|168.21|159.38|159.33
Blake1KRounds|83.75|66.59|65.88|68.37|64.76
Blake10MRounds|117.79|77.21|69.63|83.19|69.05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
levm Lambda EVM implementation performance
Projects
Status: Done
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants