Skip to content

ci: use trimmed latency and pinned feed envs#186

Merged
ronnie-devtech merged 7 commits intoMooreThreads:mainfrom
welo516:CI
Apr 23, 2026
Merged

ci: use trimmed latency and pinned feed envs#186
ronnie-devtech merged 7 commits intoMooreThreads:mainfrom
welo516:CI

Conversation

@welo516
Copy link
Copy Markdown
Contributor

@welo516 welo516 commented Apr 23, 2026

Summary

  • add MUSA_PINNED_FEED=1 and MUSA_PINNED_H2D_ON_COMPUTE_STREAM=1 to the BD model perf runs in CI
  • compare performance using trimmed_avg_ms, with fallback to average_time_ms for compatibility with older runner output

Why

  • the updated musa_run_pb_graph.py supports pinned-feed execution and reports a trimmed latency average
  • CI was still using the old invocation and old metric, so the perf comparison no longer matched the current runner behavior

Validation

  • git diff --check -- .github/workflows/pr-validation.yml

@weloMThreads weloMThreads changed the title [codex] ci: use trimmed latency and pinned feed envs ci: use trimmed latency and pinned feed envs Apr 23, 2026
@welo516 welo516 marked this pull request as ready for review April 23, 2026 06:35
@ronnie-devtech ronnie-devtech merged commit 4ffe74c into MooreThreads:main Apr 23, 2026
21 of 25 checks passed
tngchien pushed a commit to tngchien/tensorflow_musa_extension that referenced this pull request Apr 24, 2026
* ci: use trimmed latency and pinned feed envs

* ci: pin pruned graph batch size

* ci: cancel superseded PR validation runs

* ci: stop passing duplicate pruned graph batch flag

* ci: tune tf test model runner args

* ci: update workflow naming and notifications

* ci: test PR merge commit
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants