-
Notifications
You must be signed in to change notification settings - Fork 84
perf(levm): add AVX256 implementation of BLAKE2 #3590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Lines of code reportTotal lines added: Detailed view
|
No significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
|
No significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
|
Benchmark for ea7c34bClick to view benchmark
|
No significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
|
Benchmark for 0544811Click to view benchmark
|
Benchmark Block Execution Results Comparison Against Main
|
Benchmark Results ComparisonNo significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
Benchmark Results: SstoreBench_no_opt
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Looks good from my side, additionally i ran this code under miri to check if it found anything, adding some more tests and it ran ok.
|
**Motivation** To improve BLAKE2 performance. **Description** Why AVX256 instead of AVX512? Mainly that [AVX512](rust-lang/rust#111137) intrinsics are still experimental. Creates a common/crypto module to house blake2. We should consider moving here other cryptographic operations currently inside precompiles.rs. If avx2 is available, a permute-with-gather implementation is used. Usage of unsafe is required for SIMD loads and stores. It should be reviewed that alignment requirements are satisfied and that no out-of-bounds operations are possible. Note that aside from the obvious ones with "load" or "store" in the name, gather also represents a series of memory loads. Unsafe is also required to call the first avx2-enabled function, since we must first ensure avx2 is actually available on the target CPU. ** Benchmarks ** ### PR |Title|Max (MGas/s)|p50 (MGas/s)|p95 (MGas/s)|p99 (MGas/s)|Min (MGas/s)| |----|--------------|--------------|-------------|--------------|--------------| Blake1MRounds|120.19|93.97|93.38|99.85|91.54 Blake1Round|226.42|175.09|170.08|166.83|166.82 Blake1KRounds|122.36|97.28|96.09|100.90|95.87 Blake10MRounds|174.36|110.78|104.15|124.33|103.89 ### Main |Title|Max (MGas/s)|p50 (MGas/s)|p95 (MGas/s)|p99 (MGas/s)|Min (MGas/s)| |----|--------------|--------------|-------------|--------------|--------------| Blake1MRounds|80.79|63.04|62.57|67.80|62.50 Blake1Round|223.59|174.93|168.21|159.38|159.33 Blake1KRounds|83.75|66.59|65.88|68.37|64.76 Blake10MRounds|117.79|77.21|69.63|83.19|69.05
Motivation
To improve BLAKE2 performance.
Description
Why AVX256 instead of AVX512? Mainly that AVX512 intrinsics are still experimental.
Creates a common/crypto module to house blake2. We should consider moving here other cryptographic operations currently inside precompiles.rs.
If avx2 is available, a permute-with-gather implementation is used.
Usage of unsafe is required for SIMD loads and stores. It should be reviewed that alignment requirements are satisfied and that no out-of-bounds operations are possible.
Note that aside from the obvious ones with "load" or "store" in the name, gather also represents a series of memory loads.
Unsafe is also required to call the first avx2-enabled function, since we must first ensure avx2 is actually available on the target CPU.
** Benchmarks **
PR
Main