Benchmark improvements: ideas for deeper performance analysis

## Context

During the `perf-proofs` and `game-benchmark` work (#2), we discovered that micro-benchmarks can mislead when V8 IC/shape/allocation patterns differ from real workloads. These ideas would improve benchmark quality and surface new optimization opportunities.

## Benchmark infrastructure improvements

- [ ] **Deterministic replay benchmark** — Seeded RNG and fixed mutation scripts to reduce run-to-run noise and improve A/B confidence
- [ ] **Percentile reporting** — Add p50/p95/p99 frame time and system time, not just mean/stddev
- [ ] **Warmup/steady-state separation** — Measure first-N frames vs steady-state frames to catch JIT/IC effects
- [ ] **GC-sensitive runs** — Track allocs/GC pauses (`--trace-gc` sampling) alongside FPS/op

## Scaling & stress tests

- [ ] **Archetype-count sweep** — Fixed entities, vary archetypes (5, 20, 50, 100) to expose index/matching scaling
- [ ] **Component count scaling** — Fixed entities/archetypes, vary components per entity (3, 10, 20, 50) to test `matches()` loop cost
- [ ] **Query fan-out stress** — Many overlapping queries touching same entities vs disjoint queries
- [ ] **Sparse vs dense component distribution** — Vary component presence probabilities to test branch predictability in `matches()`
- [ ] **`without()` exclusion scaling** — Benchmark exclusion count growth (0, 1, 3, 5) since `matches()` now has two loops

## Mutation & lifecycle patterns

- [ ] **Mutation-shape matrix** — Separate add-only, remove-only, flip (add/remove), and multi-component batch updates
- [ ] **Churn vs stable world** — Compare static entity sets vs high create/delete churn per frame
- [ ] **Entity pool recycling vs fresh allocation** — `deleteEntity` + `createEntity` cycling vs toggling components to "deactivate". The `delete` cost (3-30x slower than `= undefined`) suggests pooling could be a big user-facing win
- [ ] **Batch entity creation** — `createEntity` iterates all archetypes per entity. Measure the gap to decide if a `createEntities(batch)` API is worth adding

## V8-specific investigations

- [ ] **Entity shape stability** — Benchmark monomorphic entities vs polymorphic component insertion order
- [ ] **Archetype registration order sensitivity** — Test whether registering archetypes simple→complex vs random order affects steady-state `matches()` throughput (IC deopt paths)
- [ ] **Single-thread contention baseline** — Run with synthetic background CPU load to test robustness of gains

## Iteration API

- [ ] **Iteration API usage benchmark** — `for...of`, indexed arrays, `forEach` across realistic system bodies (not empty loops). We know `.call()` adds up to 3.2x overhead at 10k — document the recommendation for users

## Priority

Suggested high-value first picks:
1. **Deterministic replay** + **percentile reporting** — improves quality of all future measurements
2. **Archetype-count sweep** — directly informs whether `#componentIndex` and `#getAffectedArchetypes` optimizations are worth pursuing
3. **Mutation-shape matrix** — better understanding of the `add_remove` regression we saw with Set→Array

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark improvements: ideas for deeper performance analysis #3

Context

Benchmark infrastructure improvements

Scaling & stress tests

Mutation & lifecycle patterns

V8-specific investigations

Iteration API

Priority

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark improvements: ideas for deeper performance analysis #3

Description

Context

Benchmark infrastructure improvements

Scaling & stress tests

Mutation & lifecycle patterns

V8-specific investigations

Iteration API

Priority

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions