Skip to content

perf: investigate per-user metrics pre-aggregation to reduce channel pressure at high RPS #675

Description

@jeremyandrews

Background

At very high RPS the main thread's receive_metrics() loop becomes a bottleneck: it processes raw per-request GooseMetric messages and has a hard 400ms budget per 500ms main-loop iteration. Channel volume scales linearly with RPS (O(RPS)), which limits single-instance throughput.

Goose already solves this problem in Gaggle (distributed) mode: worker nodes pre-aggregate metrics locally and send periodic summaries to the manager, rather than forwarding individual request events. This ticket explores bringing the same approach to standalone mode.

Key observation: logging is unaffected

Per-request log files (--request-log, --transaction-log, --scenario-log, --error-log) already travel through an entirely separate GooseLog channel to the logger thread. Pre-aggregating the metrics channel would have zero impact on logging granularity. Full per-request detail would remain available in log files as today.

What we'd be trading

With pre-aggregation, the metrics channel would carry periodic summaries (e.g., every N requests or every T ms) instead of individual events. The main thread merges these summaries the same way it currently merges raw events. Statistical accuracy is preserved — histograms, percentiles, and counts are all mergeable. The only difference is that the main thread sees data in batches, not per-request, which means:

  • Running metrics (--running-metrics) reflect slightly lagged counts (bounded by flush interval)
  • No per-request data visible through the metrics channel — but this is already the case; that data was always in the log files

Proposed approach

  1. Add a GooseMetricBatch struct mirroring GooseRequestMetricAggregate but sized for a single user's interval:

    struct GooseMetricBatch {
        request_key: String,         // "<METHOD> <name>"
        success_count: u64,
        fail_count: u64,
        response_times: BTreeMap<usize, usize>,
        status_codes: HashMap<u16, u64>,
        // ... plus transaction/scenario counts
        errors: Vec<GooseRequestMetric>,   // errors still sent individually
    }
  2. Each GooseUser accumulates a GooseMetricBatch locally and flushes it to the channel every N requests (e.g., 100) or every T ms (e.g., 250ms), whichever comes first.

  3. The main thread receives GooseMetric::Batch(GooseMetricBatch) and merges it into the existing GooseRequestMetricAggregate structures — identical to how Gaggle manager merges worker reports.

  4. Error events (non-2xx responses) continue to be sent individually so the error summary retains full fidelity. This mirrors Gaggle behavior.

Questions to resolve

  • What flush interval / batch size gives the right tradeoff between channel pressure and metrics freshness for --running-metrics?
  • Does the Gaggle aggregation path (src/metrics.rs merge logic) already provide a clean abstraction to reuse, or does it need to be factored out first?
  • Should batching be opt-in (e.g., --batch-metrics) or the new default?

Related

  • Gaggle worker→manager aggregation in src/metrics.rs is the closest existing reference implementation.
  • This is a larger change than the other metrics performance tickets and would benefit from a prototype branch to measure actual throughput improvement before committing to the design.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions