Skip to content

Benchmark improvements: ideas for deeper performance analysis #3

@jakeklassen

Description

@jakeklassen

Context

During the perf-proofs and game-benchmark work (#2), we discovered that micro-benchmarks can mislead when V8 IC/shape/allocation patterns differ from real workloads. These ideas would improve benchmark quality and surface new optimization opportunities.

Benchmark infrastructure improvements

  • Deterministic replay benchmark — Seeded RNG and fixed mutation scripts to reduce run-to-run noise and improve A/B confidence
  • Percentile reporting — Add p50/p95/p99 frame time and system time, not just mean/stddev
  • Warmup/steady-state separation — Measure first-N frames vs steady-state frames to catch JIT/IC effects
  • GC-sensitive runs — Track allocs/GC pauses (--trace-gc sampling) alongside FPS/op

Scaling & stress tests

  • Archetype-count sweep — Fixed entities, vary archetypes (5, 20, 50, 100) to expose index/matching scaling
  • Component count scaling — Fixed entities/archetypes, vary components per entity (3, 10, 20, 50) to test matches() loop cost
  • Query fan-out stress — Many overlapping queries touching same entities vs disjoint queries
  • Sparse vs dense component distribution — Vary component presence probabilities to test branch predictability in matches()
  • without() exclusion scaling — Benchmark exclusion count growth (0, 1, 3, 5) since matches() now has two loops

Mutation & lifecycle patterns

  • Mutation-shape matrix — Separate add-only, remove-only, flip (add/remove), and multi-component batch updates
  • Churn vs stable world — Compare static entity sets vs high create/delete churn per frame
  • Entity pool recycling vs fresh allocationdeleteEntity + createEntity cycling vs toggling components to "deactivate". The delete cost (3-30x slower than = undefined) suggests pooling could be a big user-facing win
  • Batch entity creationcreateEntity iterates all archetypes per entity. Measure the gap to decide if a createEntities(batch) API is worth adding

V8-specific investigations

  • Entity shape stability — Benchmark monomorphic entities vs polymorphic component insertion order
  • Archetype registration order sensitivity — Test whether registering archetypes simple→complex vs random order affects steady-state matches() throughput (IC deopt paths)
  • Single-thread contention baseline — Run with synthetic background CPU load to test robustness of gains

Iteration API

  • Iteration API usage benchmarkfor...of, indexed arrays, forEach across realistic system bodies (not empty loops). We know .call() adds up to 3.2x overhead at 10k — document the recommendation for users

Priority

Suggested high-value first picks:

  1. Deterministic replay + percentile reporting — improves quality of all future measurements
  2. Archetype-count sweep — directly informs whether #componentIndex and #getAffectedArchetypes optimizations are worth pursuing
  3. Mutation-shape matrix — better understanding of the add_remove regression we saw with Set→Array

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions