perf: investigate per-user metrics pre-aggregation to reduce channel pressure at high RPS

## Background

At very high RPS the main thread's `receive_metrics()` loop becomes a bottleneck: it processes raw per-request `GooseMetric` messages and has a hard 400ms budget per 500ms main-loop iteration. Channel volume scales linearly with RPS (`O(RPS)`), which limits single-instance throughput.

Goose already solves this problem in **Gaggle (distributed) mode**: worker nodes pre-aggregate metrics locally and send periodic summaries to the manager, rather than forwarding individual request events. This ticket explores bringing the same approach to standalone mode.

## Key observation: logging is unaffected

Per-request log files (`--request-log`, `--transaction-log`, `--scenario-log`, `--error-log`) already travel through an entirely separate `GooseLog` channel to the logger thread. Pre-aggregating the *metrics* channel would have zero impact on logging granularity. Full per-request detail would remain available in log files as today.

## What we'd be trading

With pre-aggregation, the metrics channel would carry periodic summaries (e.g., every N requests or every T ms) instead of individual events. The main thread merges these summaries the same way it currently merges raw events. Statistical accuracy is preserved — histograms, percentiles, and counts are all mergeable. The only difference is that the main thread sees data in batches, not per-request, which means:

- Running metrics (`--running-metrics`) reflect slightly lagged counts (bounded by flush interval)
- No per-request data visible *through the metrics channel* — but this is already the case; that data was always in the log files

## Proposed approach

1. Add a `GooseMetricBatch` struct mirroring `GooseRequestMetricAggregate` but sized for a single user's interval:
   ```rust
   struct GooseMetricBatch {
       request_key: String,         // "<METHOD> <name>"
       success_count: u64,
       fail_count: u64,
       response_times: BTreeMap<usize, usize>,
       status_codes: HashMap<u16, u64>,
       // ... plus transaction/scenario counts
       errors: Vec<GooseRequestMetric>,   // errors still sent individually
   }
   ```

2. Each `GooseUser` accumulates a `GooseMetricBatch` locally and flushes it to the channel every N requests (e.g., 100) or every T ms (e.g., 250ms), whichever comes first.

3. The main thread receives `GooseMetric::Batch(GooseMetricBatch)` and merges it into the existing `GooseRequestMetricAggregate` structures — identical to how Gaggle manager merges worker reports.

4. Error events (non-2xx responses) continue to be sent individually so the error summary retains full fidelity. This mirrors Gaggle behavior.

## Questions to resolve

- What flush interval / batch size gives the right tradeoff between channel pressure and metrics freshness for `--running-metrics`?
- Does the Gaggle aggregation path (`src/metrics.rs` merge logic) already provide a clean abstraction to reuse, or does it need to be factored out first?
- Should batching be opt-in (e.g., `--batch-metrics`) or the new default?

## Related

- Gaggle worker→manager aggregation in `src/metrics.rs` is the closest existing reference implementation.
- This is a larger change than the other metrics performance tickets and would benefit from a prototype branch to measure actual throughput improvement before committing to the design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: investigate per-user metrics pre-aggregation to reduce channel pressure at high RPS #675

Background

Key observation: logging is unaffected

What we'd be trading

Proposed approach

Questions to resolve

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

perf: investigate per-user metrics pre-aggregation to reduce channel pressure at high RPS #675

Description

Background

Key observation: logging is unaffected

What we'd be trading

Proposed approach

Questions to resolve

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions