Skip to content

Conversation

@jioffe502
Copy link
Collaborator

@jioffe502 jioffe502 commented Nov 19, 2025

Description

  • Plumbed enable_traces/trace_output_dir through the test config and e2e case so trace payloads are captured automatically during scripted runs.
  • Added trace_summary generation in scripts/tests/cases/e2e.py, writing per-stage aggregates plus per-document totals; run.py now records trace flags in results.json.
  • Documented how to enable tracing, run baseline vs RC comparisons, and consume the new artifacts (README updates + profiling workflow notes).
  • Introduced scripts/tests/tools/plot_stage_totals.py, a helper that reads any results.json and emits a PNG + textual summary showing cumulative resident seconds per stage (with options to sort, collapse nested entries, filter network noise, etc.).
  • Document-level wall time: _summarize_traces now records each doc’s elapsed span (first stage entry → last exit) so the trace artifacts report realistic wall clocks in addition to resident totals.
  • Dual visualization flow: plot_stage_totals.py grew an optional wall-time chart (results.wall_time.png) that contrasts per-doc wall vs resident seconds, highlights effective parallelism (resident/wall ratio), and prints summaries alongside the existing stage-resident PNG.

Testing:

  • Generated bo20 and bo767 runs with ENABLE_TRACES=true; verified results.json contains the new trace_summary, trace files land under artifacts/.../traces/, and the plotting tool produces the expected charts (*.stage_time.png) using both collapsed and nested views.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • If adjusting docker-compose.yaml environment variables have you ensured those are mimicked in the Helm values.yaml file.

@jioffe502 jioffe502 requested a review from a team as a code owner November 19, 2025 22:35
@jioffe502 jioffe502 requested a review from jperez999 November 19, 2025 22:35
- Track submission_ts_ns throughout V2 ingest pipeline
- Extract ray_wait_s, in_ray_queue_s, ray_start_ts_s, ray_end_ts_s metrics
- Enhance wall-time visualization with wait and queue time bars
- Add wait/queue time summaries and percentile statistics
- Update documentation with new profiling metrics
@jioffe502 jioffe502 marked this pull request as draft November 25, 2025 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant