Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,25 @@ By adopting sparse matrices, it improves the Pippenger algorithm [Pip76](https:/

One thing to highlight is that our implementation runs most computations on the GPU, but it’s still slower than the CPU-only solution like [Arkworks](https://github.com/arkworks-rs). However, because we target client-side devices with limited resources, applying a hybrid approach, leveraging both CPU and GPU for MSM tasks and combining the results at the end, can yield an implementation slightly faster than a pure-CPU one. Check the write-up below for estimated speedups with this hybrid method.

## Profiling summary (v1 vs v2)

Environment: M1 Pro, macOS 15.2, curve `ark_bn254`, dataset 2^20 unless stated. Medians of 5 runs.

### v2 → v1

| metric | v1[^1] | v2[^2] | gain |
|---|---|---|---|
| end-to-end latency | 10.3 s | **0.42 s** | **×24** |
| GPU occupancy | 32 % | 76 % | +44 pp |
| CPU share | 19 % | **<3 %** | –16 pp |
| peak VRAM | 1.6 GB | **220 MB** | –7.3× |

Key changes:

* single sparse-matrix kernel eliminates most launches and memory thrash
* CSR buckets keep data on-device → near-zero host↔GPU traffic
* on-GPU radix sort makes preprocessing parallel

## How to use

Metal MSM v2 works with `arkworks v0.4.x`; just include the crate in your `Cargo.toml`.
Expand Down Expand Up @@ -144,3 +163,6 @@ mod tests {
## Acknowledgements

This work was initially sponsored by a joint grant from [PSE](https://pse.dev/) and [0xPARC](https://0xparc.org/). It is currently incubated by PSE.

[^1]: https://hackmd.io/@yaroslav-ya/rJkpqc_Nke
[^2]: https://hackmd.io/@yaroslav-ya/HyFA7XAQll
Loading