Add batched/concurrent MSM benchmarks for real-world proving workloads

## Problem

Current benchmarks only measure **single MSM** performance at various input sizes (2^10 through 2^24). In real proving systems, MSM is called **multiple times in parallel** — for example, a 2^14 MSM called 2048 times concurrently during a single proving pass.

GPU may significantly outperform multi-threaded CPU in these batched scenarios due to:
- Better utilization of parallel compute units when saturated with concurrent work
- Amortization of setup overhead (buffer allocation, shader compilation) across batches
- Different memory access patterns under concurrent load

Without batched benchmarks, we may be underestimating GPU's real-world advantage (or missing optimization opportunities).

## Current State

All existing benchmarks run a **single MSM computation per iteration**:
- `benches/e2e.rs` — Criterion benchmark, one MSM per sample
- `tests/cuzk/e2e.rs` — end-to-end test, single MSM execution

There is no batched or concurrent MSM benchmarking anywhere in the codebase.

## Proposed Work

### 1. Batched MSM Benchmarks
Add benchmarks that measure throughput when running multiple MSMs concurrently:
- **Varying batch sizes**: e.g., 1, 4, 16, 64, 256, 1024, 2048 concurrent MSMs
- **Varying MSM sizes within batches**: e.g., batches of 2^14 MSMs (common in real provers)
- **Metrics**: total wall-clock time, throughput (MSMs/sec), per-MSM latency under load

### 2. Batched MSM API (stretch)
Consider whether a dedicated batched MSM API could improve performance by:
- Sharing Metal command buffers across MSMs in a batch
- Pipelining GPU work (overlap data transfer with computation)
- Reusing allocated buffers across MSMs of the same size

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batched/concurrent MSM benchmarks for real-world proving workloads #99

Problem

Current State

Proposed Work

1. Batched MSM Benchmarks

2. Batched MSM API (stretch)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add batched/concurrent MSM benchmarks for real-world proving workloads #99

Description

Problem

Current State

Proposed Work

1. Batched MSM Benchmarks

2. Batched MSM API (stretch)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions