Skip to content

Log per-server inference metrics#2650

Open
samsja wants to merge 1 commit into
mainfrom
codex/per-server-inference-metrics
Open

Log per-server inference metrics#2650
samsja wants to merge 1 commit into
mainfrom
codex/per-server-inference-metrics

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented May 27, 2026

Summary

  • add per-server inference metric scopes under inference/server/<server>/...
  • add direct server liveness metrics via .../up
  • add KV cache hit aliases and remaining KV cache metrics
  • keep aggregate and P/D role-aware metrics intact

Testing

  • /shared/research-prod/prime-rl/.venv/bin/ruff check src/prime_rl/orchestrator/inference_metrics.py tests/unit/orchestrator/test_inference_metrics.py
  • PYTHONPATH=src:packages/prime-rl-configs/src /shared/research-prod/prime-rl/.venv/bin/python -m pytest tests/unit/orchestrator/test_inference_metrics.py

Note: uv run was not usable in the standalone /tmp worktree because workspace submodules were not initialized there, so I used the existing repo venv against the /tmp worktree.


Note

Low Risk
Observability-only changes to metrics polling and W&B logging; no changes to inference serving or training control paths.

Overview
Extends the orchestrator inference metrics collector so W&B gets per-server scopes alongside existing aggregate and prefill/decode role scopes.

Each admin client gets a stable server_XX_<host>_<port> name; metrics are logged under inference/server/<name>/... from that endpoint alone. inference/server/<name>/up is emitted for every configured server (1 when the latest poll returned metrics, 0 otherwise), including polls with no successful samples. Smoothed per-server keys are removed when a server stops responding so dashboards do not show stale values.

KV cache naming is duplicated for convenience: prefix hit rates alias to kv_cache_hit_rate / cpu_kv_cache_hit_rate, and usage mean/max map to remaining capacity metrics (kv_cache_left_perc_*, cpu_kv_cache_left_perc_*) via 1 - usage.

Reviewed by Cursor Bugbot for commit 6e1758e. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 78dd289. Configure here.

"""Do not keep logging stale per-server metrics when a server fails to respond."""
for key in list(smoothed_metrics):
if key.startswith("inference/server/") and key not in current_metrics:
del smoothed_metrics[key]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale drop omits active server metrics

Medium Severity

drop_stale_server_metrics removes any smoothed key under inference/server/ that is absent from the current poll’s metrics dict. build_scope_metrics only adds many per-server fields conditionally (throughput, histogram averages, KV cache stats, cache aliases), so a still-responding server can omit keys on a given cycle while remaining in active_server_names. Those smoothed series are then dropped from the W&B payload even though the server is up, causing gaps and misleading dashboards.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 78dd289. Configure here.

@samsja samsja force-pushed the codex/per-server-inference-metrics branch from 78dd289 to 6e1758e Compare May 27, 2026 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant