Skip to content

Conversation

iovoid
Copy link
Contributor

@iovoid iovoid commented Jul 10, 2025

Motivation

To improve BLAKE2 performance.

Description

Why AVX256 instead of AVX512? Mainly that AVX512 intrinsics are still experimental.

Creates a common/crypto module to house blake2. We should consider moving here other cryptographic operations currently inside precompiles.rs.

If avx2 is available, a permute-with-gather implementation is used.

Usage of unsafe is required for SIMD loads and stores. It should be reviewed that alignment requirements are satisfied and that no out-of-bounds operations are possible.

Note that aside from the obvious ones with "load" or "store" in the name, gather also represents a series of memory loads.

Unsafe is also required to call the first avx2-enabled function, since we must first ensure avx2 is actually available on the target CPU.

** Benchmarks **

PR

Title Max (MGas/s) p50 (MGas/s) p95 (MGas/s) p99 (MGas/s) Min (MGas/s)
Blake1MRounds 120.19 93.97 93.38 99.85 91.54
Blake1Round 226.42 175.09 170.08 166.83 166.82
Blake1KRounds 122.36 97.28 96.09 100.90 95.87
Blake10MRounds 174.36 110.78 104.15 124.33 103.89

Main

Title Max (MGas/s) p50 (MGas/s) p95 (MGas/s) p99 (MGas/s) Min (MGas/s)
Blake1MRounds 80.79 63.04 62.57 67.80 62.50
Blake1Round 223.59 174.93 168.21 159.38 159.33
Blake1KRounds 83.75 66.59 65.88 68.37 64.76
Blake10MRounds 117.79 77.21 69.63 83.19 69.05

Copy link

github-actions bot commented Jul 10, 2025

Lines of code report

Total lines added: 297
Total lines removed: 79
Total lines changed: 376

Detailed view
+-------------------------------------------------+-------+------+
| File                                            | Lines | Diff |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/crypto/blake2f/avx.rs      | 169   | +169 |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/crypto/blake2f/mod.rs      | 21    | +21  |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/crypto/blake2f/portable.rs | 106   | +106 |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/crypto/lib.rs              | 1     | +1   |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/precompiles.rs        | 1007  | -79  |
+-------------------------------------------------+-------+------+

Copy link

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.190 ± 0.023 3.168 3.239 1.00
main_levm_BubbleSort 4.477 ± 0.081 4.405 4.597 1.40 ± 0.03
pr_revm_BubbleSort 3.224 ± 0.017 3.202 3.252 1.01 ± 0.01
pr_levm_BubbleSort 4.489 ± 0.032 4.466 4.575 1.41 ± 0.01

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.036 ± 0.008 1.025 1.050 1.01 ± 0.01
main_levm_ERC20Approval 1.536 ± 0.017 1.516 1.581 1.49 ± 0.02
pr_revm_ERC20Approval 1.028 ± 0.011 1.017 1.048 1.00
pr_levm_ERC20Approval 1.536 ± 0.005 1.528 1.543 1.49 ± 0.02

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 136.9 ± 0.7 136.2 138.2 1.00 ± 0.01
main_levm_ERC20Mint 257.0 ± 19.7 249.0 313.0 1.88 ± 0.14
pr_revm_ERC20Mint 136.8 ± 0.6 135.9 138.0 1.00
pr_levm_ERC20Mint 256.8 ± 2.9 254.1 264.4 1.88 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 246.0 ± 11.0 239.2 270.6 1.03 ± 0.05
main_levm_ERC20Transfer 394.3 ± 2.5 392.1 400.6 1.65 ± 0.01
pr_revm_ERC20Transfer 238.8 ± 1.1 237.8 241.2 1.00
pr_levm_ERC20Transfer 405.6 ± 3.4 402.1 414.0 1.70 ± 0.02

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 232.0 ± 0.6 231.2 233.2 1.00
main_levm_Factorial 486.8 ± 24.5 473.8 534.8 2.10 ± 0.11
pr_revm_Factorial 234.3 ± 0.3 233.6 234.6 1.01 ± 0.00
pr_levm_Factorial 477.4 ± 0.7 476.3 478.3 2.06 ± 0.01

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.648 ± 0.019 1.618 1.672 1.01 ± 0.04
main_levm_FactorialRecursive 2.879 ± 0.060 2.828 3.012 1.77 ± 0.07
pr_revm_FactorialRecursive 1.625 ± 0.058 1.465 1.663 1.00
pr_levm_FactorialRecursive 2.783 ± 0.025 2.740 2.834 1.71 ± 0.06

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 206.2 ± 1.5 202.8 207.9 1.00
main_levm_Fibonacci 465.1 ± 2.7 462.7 472.1 2.26 ± 0.02
pr_revm_Fibonacci 209.0 ± 0.7 207.3 209.9 1.01 ± 0.01
pr_levm_Fibonacci 467.2 ± 5.2 463.9 480.5 2.27 ± 0.03

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 887.4 ± 15.0 868.0 916.8 1.00 ± 0.02
main_levm_FibonacciRecursive 1513.5 ± 34.5 1481.0 1564.6 1.71 ± 0.05
pr_revm_FibonacciRecursive 883.7 ± 15.4 848.8 907.2 1.00
pr_levm_FibonacciRecursive 1442.7 ± 40.6 1416.9 1555.9 1.63 ± 0.05

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.7 ± 0.1 8.6 8.8 1.00
main_levm_ManyHashes 13.3 ± 0.1 13.2 13.4 1.53 ± 0.01
pr_revm_ManyHashes 8.8 ± 0.1 8.7 8.9 1.01 ± 0.01
pr_levm_ManyHashes 13.4 ± 0.1 13.2 13.6 1.53 ± 0.02

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 268.3 ± 3.0 265.9 276.0 1.00 ± 0.01
main_levm_MstoreBench 941.3 ± 3.7 936.4 948.4 3.51 ± 0.02
pr_revm_MstoreBench 268.1 ± 1.5 266.3 270.6 1.00
pr_levm_MstoreBench 936.1 ± 4.0 931.4 942.8 3.49 ± 0.02

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 297.9 ± 1.7 295.9 300.6 1.00
main_levm_Push 1058.5 ± 8.7 1054.1 1082.9 3.55 ± 0.04
pr_revm_Push 302.6 ± 3.0 300.4 310.7 1.02 ± 0.01
pr_levm_Push 1045.4 ± 3.2 1040.7 1052.4 3.51 ± 0.02

Copy link

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.180 ± 0.019 3.155 3.207 1.00
main_levm_BubbleSort 4.552 ± 0.145 4.401 4.868 1.43 ± 0.05
pr_revm_BubbleSort 3.242 ± 0.018 3.220 3.277 1.02 ± 0.01
pr_levm_BubbleSort 4.454 ± 0.027 4.432 4.509 1.40 ± 0.01

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.020 ± 0.004 1.016 1.030 1.00
main_levm_ERC20Approval 1.532 ± 0.010 1.519 1.547 1.50 ± 0.01
pr_revm_ERC20Approval 1.058 ± 0.010 1.049 1.076 1.04 ± 0.01
pr_levm_ERC20Approval 1.541 ± 0.020 1.523 1.585 1.51 ± 0.02

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 136.9 ± 1.3 135.8 140.3 1.00
main_levm_ERC20Mint 251.5 ± 7.6 246.1 267.7 1.84 ± 0.06
pr_revm_ERC20Mint 140.4 ± 0.7 139.5 142.0 1.03 ± 0.01
pr_levm_ERC20Mint 249.6 ± 2.6 246.9 254.9 1.82 ± 0.03

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 240.1 ± 1.3 238.7 242.5 1.00
main_levm_ERC20Transfer 394.6 ± 2.0 392.0 397.7 1.64 ± 0.01
pr_revm_ERC20Transfer 246.9 ± 2.2 245.1 252.3 1.03 ± 0.01
pr_levm_ERC20Transfer 395.6 ± 4.4 391.5 403.6 1.65 ± 0.02

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 230.8 ± 1.2 229.9 234.0 1.00
main_levm_Factorial 489.2 ± 23.4 476.0 534.5 2.12 ± 0.10
pr_revm_Factorial 233.1 ± 2.5 230.6 238.8 1.01 ± 0.01
pr_levm_Factorial 478.4 ± 3.9 475.0 488.8 2.07 ± 0.02

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.620 ± 0.020 1.595 1.649 1.01 ± 0.02
main_levm_FactorialRecursive 2.862 ± 0.064 2.775 2.942 1.78 ± 0.05
pr_revm_FactorialRecursive 1.604 ± 0.025 1.562 1.646 1.00
pr_levm_FactorialRecursive 2.781 ± 0.031 2.724 2.826 1.73 ± 0.03

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 205.5 ± 0.7 204.7 206.8 1.00
main_levm_Fibonacci 469.6 ± 6.5 463.6 481.4 2.29 ± 0.03
pr_revm_Fibonacci 207.4 ± 1.1 205.7 209.0 1.01 ± 0.01
pr_levm_Fibonacci 468.1 ± 6.2 462.3 483.9 2.28 ± 0.03

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 860.3 ± 14.1 830.9 880.6 1.00
main_levm_FibonacciRecursive 1468.7 ± 25.9 1448.2 1539.3 1.71 ± 0.04
pr_revm_FibonacciRecursive 869.8 ± 13.8 848.5 888.2 1.01 ± 0.02
pr_levm_FibonacciRecursive 1451.8 ± 9.2 1443.1 1468.8 1.69 ± 0.03

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.6 ± 0.1 8.6 8.7 1.00
main_levm_ManyHashes 13.7 ± 0.1 13.5 13.8 1.59 ± 0.02
pr_revm_ManyHashes 8.8 ± 0.0 8.8 8.8 1.02 ± 0.01
pr_levm_ManyHashes 13.4 ± 0.1 13.3 13.5 1.55 ± 0.01

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 274.3 ± 4.1 268.7 280.8 1.00
main_levm_MstoreBench 939.1 ± 5.7 931.5 950.4 3.42 ± 0.06
pr_revm_MstoreBench 282.9 ± 3.5 276.7 287.2 1.03 ± 0.02
pr_levm_MstoreBench 941.4 ± 4.8 936.3 951.0 3.43 ± 0.05

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 300.5 ± 2.4 297.7 303.6 1.01 ± 0.01
main_levm_Push 1053.6 ± 7.4 1047.6 1073.5 3.54 ± 0.03
pr_revm_Push 297.5 ± 1.2 296.0 300.4 1.00
pr_levm_Push 1061.8 ± 6.9 1049.2 1071.4 3.57 ± 0.03

Copy link

Benchmark for ea7c34b

Click to view benchmark
Test Base PR %
block payload building bench 0.2±0.00ns 0.2±0.00ns 0.00%

@iovoid iovoid changed the title perf(levm): AVX256 implementation of blake2 perf(levm): add AVX256 implementation of BLAKE2 Jul 10, 2025
@iovoid iovoid added the levm Lambda EVM implementation label Jul 10, 2025
@iovoid iovoid marked this pull request as ready for review July 10, 2025 18:45
@iovoid iovoid requested a review from a team as a code owner July 10, 2025 18:45
Copy link

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.192 ± 0.023 3.166 3.242 1.00
main_levm_BubbleSort 4.427 ± 0.050 4.396 4.567 1.39 ± 0.02
pr_revm_BubbleSort 3.233 ± 0.017 3.194 3.262 1.01 ± 0.01
pr_levm_BubbleSort 4.502 ± 0.169 4.422 4.980 1.41 ± 0.05

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.035 ± 0.008 1.029 1.056 1.00
main_levm_ERC20Approval 1.528 ± 0.038 1.506 1.630 1.48 ± 0.04
pr_revm_ERC20Approval 1.048 ± 0.009 1.038 1.069 1.01 ± 0.01
pr_levm_ERC20Approval 1.524 ± 0.016 1.511 1.556 1.47 ± 0.02

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 137.7 ± 1.1 136.9 140.6 1.00
main_levm_ERC20Mint 249.4 ± 5.8 243.9 263.1 1.81 ± 0.04
pr_revm_ERC20Mint 138.4 ± 0.5 137.8 139.3 1.00 ± 0.01
pr_levm_ERC20Mint 247.8 ± 0.6 247.0 248.4 1.80 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 243.7 ± 2.6 242.2 250.8 1.00
main_levm_ERC20Transfer 391.8 ± 3.7 389.0 400.5 1.61 ± 0.02
pr_revm_ERC20Transfer 244.0 ± 3.9 241.1 254.7 1.00 ± 0.02
pr_levm_ERC20Transfer 392.1 ± 3.1 387.6 398.2 1.61 ± 0.02

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 235.2 ± 5.2 232.3 249.5 1.00 ± 0.02
main_levm_Factorial 497.1 ± 27.4 474.1 529.4 2.12 ± 0.12
pr_revm_Factorial 234.6 ± 1.4 233.4 236.8 1.00
pr_levm_Factorial 478.4 ± 7.6 475.1 499.9 2.04 ± 0.03

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.601 ± 0.025 1.568 1.657 1.00 ± 0.03
main_levm_FactorialRecursive 2.773 ± 0.060 2.714 2.889 1.73 ± 0.05
pr_revm_FactorialRecursive 1.601 ± 0.033 1.542 1.651 1.00
pr_levm_FactorialRecursive 2.753 ± 0.018 2.731 2.786 1.72 ± 0.04

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 207.1 ± 1.3 206.1 210.8 1.00
main_levm_Fibonacci 474.3 ± 36.5 461.3 578.1 2.29 ± 0.18
pr_revm_Fibonacci 216.0 ± 2.8 214.0 223.2 1.04 ± 0.01
pr_levm_Fibonacci 468.4 ± 6.9 463.5 481.4 2.26 ± 0.04

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 866.2 ± 5.7 858.7 874.5 1.01 ± 0.01
main_levm_FibonacciRecursive 1465.6 ± 36.5 1433.8 1534.2 1.70 ± 0.05
pr_revm_FibonacciRecursive 861.0 ± 10.4 844.8 875.2 1.00
pr_levm_FibonacciRecursive 1450.4 ± 7.2 1439.4 1465.9 1.68 ± 0.02

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.7 ± 0.1 8.6 8.8 1.00 ± 0.01
main_levm_ManyHashes 13.3 ± 0.1 13.2 13.5 1.53 ± 0.01
pr_revm_ManyHashes 8.7 ± 0.0 8.7 8.8 1.00
pr_levm_ManyHashes 13.4 ± 0.1 13.3 13.8 1.54 ± 0.02

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 311.8 ± 132.1 266.7 687.6 1.16 ± 0.49
main_levm_MstoreBench 939.0 ± 3.6 933.8 947.3 3.50 ± 0.02
pr_revm_MstoreBench 268.1 ± 1.2 266.9 270.2 1.00
pr_levm_MstoreBench 943.2 ± 6.5 933.3 955.0 3.52 ± 0.03

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 294.4 ± 0.9 292.6 296.0 1.00 ± 0.00
main_levm_Push 1072.8 ± 77.8 1044.1 1293.8 3.65 ± 0.26
pr_revm_Push 293.9 ± 0.8 292.3 294.8 1.00
pr_levm_Push 1053.8 ± 4.3 1048.2 1062.2 3.59 ± 0.02

@iovoid iovoid moved this to In Review in ethrex_l1 Jul 10, 2025
@iovoid iovoid moved this from Todo to In review in ethrex_performance Jul 10, 2025
Copy link

Benchmark for 0544811

Click to view benchmark
Test Base PR %
block payload building bench 0.2±0.00ns 0.2±0.00ns 0.00%

Copy link

github-actions bot commented Jul 10, 2025

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 212.212 ± 0.969 210.866 214.028 1.00 ± 0.01
head 212.043 ± 0.993 210.946 213.913 1.00

Copy link

github-actions bot commented Jul 16, 2025

Benchmark Results Comparison

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 3.257 ± 0.016 3.230 3.285 1.01 ± 0.01
main_levm_BubbleSort 4.413 ± 0.024 4.387 4.471 1.37 ± 0.01
pr_revm_BubbleSort 3.230 ± 0.012 3.220 3.254 1.00
pr_levm_BubbleSort 4.418 ± 0.022 4.393 4.475 1.37 ± 0.01

Benchmark Results: ERC20Approval

Command Mean [s] Min [s] Max [s] Relative
main_revm_ERC20Approval 1.046 ± 0.006 1.036 1.056 1.01 ± 0.01
main_levm_ERC20Approval 1.514 ± 0.018 1.499 1.552 1.47 ± 0.02
pr_revm_ERC20Approval 1.033 ± 0.003 1.028 1.038 1.00
pr_levm_ERC20Approval 1.520 ± 0.010 1.506 1.538 1.47 ± 0.01

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 139.6 ± 1.4 138.3 143.2 1.01 ± 0.01
main_levm_ERC20Mint 244.2 ± 2.8 240.5 247.4 1.77 ± 0.02
pr_revm_ERC20Mint 138.2 ± 0.8 137.0 139.2 1.00
pr_levm_ERC20Mint 248.7 ± 3.1 245.4 256.2 1.80 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 245.4 ± 0.8 244.1 246.8 1.00
main_levm_ERC20Transfer 389.1 ± 2.7 386.1 393.4 1.59 ± 0.01
pr_revm_ERC20Transfer 245.8 ± 3.5 242.5 254.5 1.00 ± 0.01
pr_levm_ERC20Transfer 392.2 ± 2.6 389.1 395.4 1.60 ± 0.01

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 238.6 ± 1.1 237.2 241.1 1.00
main_levm_Factorial 458.8 ± 1.1 457.0 460.5 1.92 ± 0.01
pr_revm_Factorial 240.9 ± 2.4 238.7 247.0 1.01 ± 0.01
pr_levm_Factorial 461.8 ± 2.9 459.5 469.4 1.94 ± 0.01

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.572 ± 0.093 1.316 1.642 1.00
main_levm_FactorialRecursive 2.677 ± 0.021 2.646 2.708 1.70 ± 0.10
pr_revm_FactorialRecursive 1.604 ± 0.026 1.552 1.644 1.02 ± 0.06
pr_levm_FactorialRecursive 2.764 ± 0.020 2.738 2.805 1.76 ± 0.10

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 212.2 ± 0.5 211.5 213.0 1.00
main_levm_Fibonacci 446.6 ± 1.4 444.7 449.4 2.10 ± 0.01
pr_revm_Fibonacci 212.7 ± 0.7 211.5 214.0 1.00 ± 0.00
pr_levm_Fibonacci 450.7 ± 9.3 446.1 477.0 2.12 ± 0.04

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 853.7 ± 12.4 837.2 875.0 1.00
main_levm_FibonacciRecursive 1404.7 ± 10.5 1390.2 1422.3 1.65 ± 0.03
pr_revm_FibonacciRecursive 857.6 ± 22.0 832.4 911.9 1.00 ± 0.03
pr_levm_FibonacciRecursive 1461.6 ± 8.1 1451.6 1477.6 1.71 ± 0.03

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.8 ± 0.1 8.7 9.0 1.01 ± 0.01
main_levm_ManyHashes 13.1 ± 0.1 13.0 13.4 1.49 ± 0.01
pr_revm_ManyHashes 8.7 ± 0.0 8.7 8.8 1.00
pr_levm_ManyHashes 14.0 ± 0.8 13.6 16.2 1.60 ± 0.09

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 276.5 ± 2.1 274.5 280.2 1.00
main_levm_MstoreBench 936.2 ± 2.3 932.3 939.9 3.39 ± 0.03
pr_revm_MstoreBench 276.7 ± 1.1 275.6 278.9 1.00 ± 0.01
pr_levm_MstoreBench 943.1 ± 6.4 936.3 958.9 3.41 ± 0.03

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 293.8 ± 1.4 292.2 296.7 1.00
main_levm_Push 1052.0 ± 3.8 1046.8 1057.9 3.58 ± 0.02
pr_revm_Push 294.8 ± 2.3 292.9 300.7 1.00 ± 0.01
pr_levm_Push 1060.6 ± 5.9 1056.3 1076.1 3.61 ± 0.03

Benchmark Results: SstoreBench_no_opt

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_SstoreBench_no_opt 158.8 ± 0.7 157.6 159.7 1.00
main_levm_SstoreBench_no_opt 170.5 ± 2.1 168.8 174.0 1.07 ± 0.01
pr_revm_SstoreBench_no_opt 161.2 ± 5.2 157.5 170.8 1.02 ± 0.03
pr_levm_SstoreBench_no_opt 171.1 ± 1.9 168.5 173.9 1.08 ± 0.01

Copy link
Collaborator

@Arkenan Arkenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@edg-l
Copy link
Contributor

edg-l commented Jul 17, 2025

Looks good from my side, additionally i ran this code under miri to check if it found anything, adding some more tests and it ran ok.

RUSTFLAGS="-Zrandomize-layout -C target-cpu=x86-64-v4 -C target-feature=+avx2" cargo miri test

@iovoid iovoid added this pull request to the merge queue Jul 17, 2025
Merged via the queue into main with commit 165b94c Jul 17, 2025
49 checks passed
@iovoid iovoid deleted the perf/blake2-avx256 branch July 17, 2025 14:49
@github-project-automation github-project-automation bot moved this from In review to Done in ethrex_performance Jul 17, 2025
@github-project-automation github-project-automation bot moved this from In Review to Done in ethrex_l1 Jul 17, 2025
d-roak pushed a commit to 1sixtech/ethrex that referenced this pull request Jul 17, 2025
**Motivation**

To improve BLAKE2 performance.

**Description**

Why AVX256 instead of AVX512? Mainly that
[AVX512](rust-lang/rust#111137) intrinsics are
still experimental.

Creates a common/crypto module to house blake2. We should consider
moving here other cryptographic operations currently inside
precompiles.rs.

If avx2 is available, a permute-with-gather implementation is used.

Usage of unsafe is required for SIMD loads and stores. It should be
reviewed that alignment requirements are satisfied and that no
out-of-bounds operations are possible.

Note that aside from the obvious ones with "load" or "store" in the
name, gather also represents a series of memory loads.

Unsafe is also required to call the first avx2-enabled function, since
we must first ensure avx2 is actually available on the target CPU.

** Benchmarks **

### PR

|Title|Max (MGas/s)|p50 (MGas/s)|p95 (MGas/s)|p99 (MGas/s)|Min (MGas/s)|

|----|--------------|--------------|-------------|--------------|--------------|
Blake1MRounds|120.19|93.97|93.38|99.85|91.54
Blake1Round|226.42|175.09|170.08|166.83|166.82
Blake1KRounds|122.36|97.28|96.09|100.90|95.87
Blake10MRounds|174.36|110.78|104.15|124.33|103.89

### Main

|Title|Max (MGas/s)|p50 (MGas/s)|p95 (MGas/s)|p99 (MGas/s)|Min (MGas/s)|

|----|--------------|--------------|-------------|--------------|--------------|
Blake1MRounds|80.79|63.04|62.57|67.80|62.50
Blake1Round|223.59|174.93|168.21|159.38|159.33
Blake1KRounds|83.75|66.59|65.88|68.37|64.76
Blake10MRounds|117.79|77.21|69.63|83.19|69.05
pedrobergamini pushed a commit to pedrobergamini/ethrex that referenced this pull request Aug 24, 2025
**Motivation**

To improve BLAKE2 performance.

**Description**

Why AVX256 instead of AVX512? Mainly that
[AVX512](rust-lang/rust#111137) intrinsics are
still experimental.

Creates a common/crypto module to house blake2. We should consider
moving here other cryptographic operations currently inside
precompiles.rs.

If avx2 is available, a permute-with-gather implementation is used.

Usage of unsafe is required for SIMD loads and stores. It should be
reviewed that alignment requirements are satisfied and that no
out-of-bounds operations are possible.

Note that aside from the obvious ones with "load" or "store" in the
name, gather also represents a series of memory loads.

Unsafe is also required to call the first avx2-enabled function, since
we must first ensure avx2 is actually available on the target CPU.

** Benchmarks **

### PR

|Title|Max (MGas/s)|p50 (MGas/s)|p95 (MGas/s)|p99 (MGas/s)|Min (MGas/s)|

|----|--------------|--------------|-------------|--------------|--------------|
Blake1MRounds|120.19|93.97|93.38|99.85|91.54
Blake1Round|226.42|175.09|170.08|166.83|166.82
Blake1KRounds|122.36|97.28|96.09|100.90|95.87
Blake10MRounds|174.36|110.78|104.15|124.33|103.89

### Main

|Title|Max (MGas/s)|p50 (MGas/s)|p95 (MGas/s)|p99 (MGas/s)|Min (MGas/s)|

|----|--------------|--------------|-------------|--------------|--------------|
Blake1MRounds|80.79|63.04|62.57|67.80|62.50
Blake1Round|223.59|174.93|168.21|159.38|159.33
Blake1KRounds|83.75|66.59|65.88|68.37|64.76
Blake10MRounds|117.79|77.21|69.63|83.19|69.05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
levm Lambda EVM implementation performance
Projects
Archived in project
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants