feat: Collect server-side prometheus metrics #488

ajcasagrande · 2025-11-21T20:38:09Z

See tutorial: https://github.com/ai-dynamo/aiperf/blob/24f17513495eacede9975f15f5afc5cf9293f675/docs/tutorials/server-metrics.md

Summary by CodeRabbit

New Features

Added server metrics collection feature that automatically gathers Prometheus-compatible metrics from LLM inference server endpoints, including support for custom endpoint configuration, automatic discovery, and deduplication of collected data.

Documentation

Added comprehensive server metrics tutorial documenting automatic discovery, configuration via CLI options and environment variables, output file formats, metrics schemas, and usage examples.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

github-actions · 2025-11-21T20:38:17Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@e89cb2c8d04c03a2de44f67ac1ffd369b6058723

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@e89cb2c8d04c03a2de44f67ac1ffd369b6058723

Last updated for commit: e89cb2c • Browse code

codecov · 2025-11-21T20:40:38Z

Codecov Report

❌ Patch coverage is 88.30585% with 78 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/aiperf/records/records_manager.py	32.00%	17 Missing ⚠️
...rc/aiperf/server_metrics/server_metrics_manager.py	88.59%	10 Missing and 3 partials ⚠️
...rf/server_metrics/server_metrics_data_collector.py	88.17%	8 Missing and 3 partials ⚠️
...perf/common/mixins/base_metrics_collector_mixin.py	92.22%	6 Missing and 1 partial ⚠️
src/aiperf/controller/system_controller.py	50.00%	6 Missing ⚠️
src/aiperf/common/enums/prometheus_enums.py	80.00%	4 Missing ⚠️
src/aiperf/common/metric_utils.py	78.94%	2 Missing and 2 partials ⚠️
...cessors/server_metrics_export_results_processor.py	93.33%	4 Missing ⚠️
...iperf/common/mixins/buffered_jsonl_writer_mixin.py	90.00%	3 Missing ⚠️
src/aiperf/common/config/output_config.py	80.00%	2 Missing ⚠️
... and 4 more

📢 Thoughts on this report? Let us know!

coderabbitai · 2025-11-21T20:47:31Z

Walkthrough

This PR introduces comprehensive server metrics collection to AIPerf. It adds Prometheus-compatible endpoint discovery, automatic metrics scraping with configurable intervals, async HTTP collection infrastructure, deduplication and buffering mechanisms, new message types for inter-service communication, a dedicated ServerMetricsManager service, export processors, and extensive documentation and tests.

Changes

Cohort / File(s)	Summary
Configuration & Environment `src/aiperf/common/config/config_defaults.py`, `src/aiperf/common/config/groups.py`, `src/aiperf/common/config/output_config.py`, `src/aiperf/common/config/user_config.py`, `src/aiperf/common/environment.py`	Adds server metrics configuration: new OutputDefaults file paths, Groups.SERVER_METRICS group, user_config field server_metrics with URL normalization and parsing, output_config properties for export files, and environment variables for collection intervals, batch sizes, ports, timeouts.
Core Models & Data Structures `src/aiperf/common/models/server_metrics_models.py`, `src/aiperf/common/models/__init__.py`	Introduces comprehensive Pydantic models for metrics: HistogramData, SummaryData, SlimMetricSample, MetricSample, MetricFamily, MetricSchema, ServerMetricsRecord with to_slim() and extract_metadata() methods, and related metadata structures.
Message Types & Enums `src/aiperf/common/messages/server_metrics_messages.py`, `src/aiperf/common/messages/__init__.py`, `src/aiperf/common/enums/message_enums.py`, `src/aiperf/common/enums/post_processor_enums.py`, `src/aiperf/common/enums/prometheus_enums.py`, `src/aiperf/common/enums/service_enums.py`	Adds ServerMetricsRecordsMessage and ServerMetricsStatusMessage for inter-service communication, MessageType enums for server metrics, ResultsProcessorType.SERVER_METRICS_JSONL_WRITER, PrometheusMetricType enum with COUNTER/GAUGE/HISTOGRAM/SUMMARY/UNKNOWN, and ServiceType.SERVER_METRICS_MANAGER.
Metrics Collection Infrastructure `src/aiperf/common/mixins/base_metrics_collector_mixin.py`, `src/aiperf/common/mixins/buffered_jsonl_writer_mixin.py`, `src/aiperf/common/mixins/__init__.py`, `src/aiperf/common/duplicate_tracker.py`, `src/aiperf/common/metric_utils.py`	Adds BaseMetricsCollectorMixin for async HTTP metrics scraping with reachability checks and collection intervals; BufferedJSONLWriterMixinWithDeduplication for batching and deduplication; AsyncKeyedDuplicateTracker for per-key duplicate suppression; utility functions for endpoint URL normalization and hostname-aware Prometheus endpoint discovery.
Server Metrics Services `src/aiperf/server_metrics/server_metrics_data_collector.py`, `src/aiperf/server_metrics/server_metrics_manager.py`, `src/aiperf/server_metrics/__init__.py`	Implements ServerMetricsDataCollector for parsing Prometheus metrics and ServerMetricsManager for coordinating collectors, endpoint discovery, status messaging, and lifecycle management.
Results Processing & Integration `src/aiperf/post_processors/server_metrics_export_results_processor.py`, `src/aiperf/post_processors/__init__.py`, `src/aiperf/records/records_manager.py`, `src/aiperf/records/__init__.py`, `src/aiperf/controller/system_controller.py`, `src/aiperf/common/protocols.py`	Adds ServerMetricsExportResultsProcessor for JSONL export with metadata extraction, ServerMetricsResultsProcessorProtocol, records manager integration for routing metrics messages, system controller handlers for status updates, and tracking state types.
Refactored Components `src/aiperf/gpu_telemetry/telemetry_data_collector.py`, `src/aiperf/post_processors/telemetry_export_results_processor.py`	Refactors TelemetryDataCollector to use BaseMetricsCollectorMixin instead of direct lifecycle management; updates TelemetryExportResultsProcessor to use BufferedJSONLWriterMixinWithDeduplication.
Documentation `README.md`, `docs/cli_options.md`, `docs/tutorials/server-metrics.md`	Adds feature entry and tutorials documenting automatic Prometheus endpoint discovery, collection behavior, configuration via environment variables, output formats (JSONL and metadata JSON), and metric deduplication.
Tests `tests/unit/common/test_duplicate_tracker.py`, `tests/unit/common/test_metric_utils.py`, `tests/unit/server_metrics/*`, `tests/unit/post_processors/test_server_metrics_export_results_processor.py`, `tests/unit/gpu_telemetry/test_telemetry_data_collector.py`, `tests/unit/post_processors/test_telemetry_export_results_processor.py`	Adds comprehensive test coverage for deduplication, URL utilities, server metrics collection, parsing, manager orchestration, export processor behavior, and updated telemetry tests reflecting the refactor.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Areas requiring extra attention:

BaseMetricsCollectorMixin (src/aiperf/common/mixins/base_metrics_collector_mixin.py): Core HTTP collection infrastructure with async lifecycle, reachability checks, and background tasks; verify thread-safety and proper cleanup.
ServerMetricsDataCollector parsing logic (src/aiperf/server_metrics/server_metrics_data_collector.py): Complex Prometheus metric parsing across counter/gauge/histogram/summary types with deduplication and aggregation; validate correctness of process*_family methods.
ServerMetricsManager endpoint discovery and lifecycle (src/aiperf/server_metrics/server_metrics_manager.py): Service coordination, message publishing, and error handling across multiple collectors; check message flow and fallback behavior.
AsyncKeyedDuplicateTracker (src/aiperf/common/duplicate_tracker.py): Per-key locking and deduplication semantics; ensure thread-safety with concurrent records.
TelemetryDataCollector refactor (src/aiperf/gpu_telemetry/telemetry_data_collector.py): Verify that the migration to BaseMetricsCollectorMixin preserves existing behavior and callbacks work correctly.
ServerMetricsExportResultsProcessor metadata handling (src/aiperf/post_processors/server_metrics_export_results_processor.py): Metadata update logic, atomicity of file writes, and lock safety during concurrent record processing.

Poem

🐰 Prometheus whispers from endpoints far and wide,
Our metrics collector hops with boundless pride,
Deduplicating echoes, buffering with care,
Server stats dance through the network air,
From LLMs to systems, telemetry takes flight!

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Collect server-side prometheus metrics' accurately summarizes the main feature being added to the codebase. It clearly communicates that this PR introduces server-side Prometheus metrics collection functionality.
Docstring Coverage	✅ Passed	Docstring coverage is 92.38% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/aiperf/gpu_telemetry/telemetry_data_collector.py (1)

23-41: Docstring mostly updated, but “logged and re-raised” no longer matches actual behavior

The class docstring correctly mentions BaseMetricsCollectorMixin, but in the _collect_and_process_metrics docstring (Lines 72-73 in this file) you still say “Any exception from fetch or parse is logged and re-raised.” With the mixin, exceptions are now caught in _collect_metrics_task and either:

delivered to error_callback, or

logged via self.error, but not re-raised (except CancelledError).

Consider updating that “Raises:” section to describe the new behavior (e.g., “Exceptions propagate to the mixin’s background task, which logs or forwards them via error_callback; callers of this method should not rely on it raising.”).

🧹 Nitpick comments (20)

docs/cli_options.md (1)
417-417: Format URLs as proper markdown links.

The bare URLs should be formatted as markdown links for better readability and to follow markdown best practices.

Consider applying this diff:
-Server metrics collection (ENABLED BY DEFAULT). Automatically collects from inference endpoint base_url + `/metrics`. Optionally specify additional custom Prometheus-compatible endpoint URLs (e.g., http://node1:8081/metrics, http://node2:9090/metrics). Use AIPERF_SERVER_METRICS_ENABLED=false to disable. Example: `--server-metrics node1:8081 node2:9090/metrics` for additional endpoints.
+Server metrics collection (ENABLED BY DEFAULT). Automatically collects from inference endpoint base_url + `/metrics`. Optionally specify additional custom Prometheus-compatible endpoint URLs (e.g., `http://node1:8081/metrics`, `http://node2:9090/metrics`). Use `AIPERF_SERVER_METRICS_ENABLED=false` to disable. Example: `--server-metrics node1:8081 node2:9090/metrics` for additional endpoints.
tests/unit/common/test_duplicate_tracker.py (1)
260-279: Make test_concurrent_writes_to_same_key actually concurrent

Despite the name/docstring, this test currently performs all writes sequentially via write_sequence, so it doesn’t exercise per-key locking under real concurrency.

Consider rewriting to launch multiple writers with asyncio.gather:
     @pytest.mark.asyncio
     @pytest.mark.parametrize("num_tasks,writes_per_task", [(3, 10), (5, 5), (10, 3)])  # fmt: skip
     async def test_concurrent_writes_to_same_key(
         self,
         tracker: AsyncKeyedDuplicateTracker[SampleRecord],
         num_tasks: int,
         writes_per_task: int,
     ):
         """Test that concurrent writes to the same key are handled safely."""
-        results = await write_sequence(
-            tracker, "key1", [1] * (num_tasks * writes_per_task)
-        )
-
-        # At least first record should be written
-        total_written = sum(len(r) for r in results)
+        async def write_to_key(count: int) -> int:
+            total = 0
+            for _ in range(count):
+                record = SampleRecord(key="key1", value=1)
+                result = await tracker.deduplicate_record(record)
+                total += len(result)
+            return total
+
+        # Run multiple writers concurrently to the same key
+        results = await asyncio.gather(
+            *[write_to_key(writes_per_task) for _ in range(num_tasks)]
+        )
+
+        # At least first record should be written
+        total_written = sum(results)
The remaining assertions (total_written >= 1 and < num_tasks * writes_per_task) can stay as-is.
tests/unit/server_metrics/test_server_metrics_models.py (1)

17-191: Solid coverage for server metrics model serialization

The tests here exercise the critical paths for MetricSample.to_slim, ServerMetricsRecord.to_slim, MetricSchema, and the histogram/summary data shapes, including endpoint URL preservation and optional sum/count handling. This gives good confidence in the JSONL-facing representations; any further assertions (e.g., extra is None checks for unused fields) would be purely optional.
tests/unit/server_metrics/test_server_metrics_data_collector.py (2)
20-43: Prefer public properties over private attributes in tests.

The tests directly access private attributes (_endpoint_url, _collection_interval, _reachability_timeout, _session). Based on the mixin interface (src/aiperf/common/mixins/base_metrics_collector_mixin.py lines 74-81), public properties are available. Using public properties makes tests more resilient to internal implementation changes.

Apply this diff to use public properties:
-        assert collector._endpoint_url == "http://localhost:8081/metrics"
-        assert collector._collection_interval == 0.5
-        assert collector._reachability_timeout == 10.0
+        assert collector.endpoint_url == "http://localhost:8081/metrics"
+        assert collector.collection_interval == 0.5
+        # Note: reachability_timeout doesn't have a public property in the mixin
+        assert collector._reachability_timeout == 10.0
         assert collector.id == "test_collector"
         assert collector._session is None
         assert not collector.was_initialized
-        assert collector._endpoint_url == "http://localhost:8081/metrics"
-        assert collector._collection_interval == 0.33
+        assert collector.endpoint_url == "http://localhost:8081/metrics"
+        assert collector.collection_interval == 0.33
         assert collector.id == "server_metrics_collector"
245-264: Consider testing deduplication for other metric types.

The deduplication test only covers counters with last-wins behavior. While this validates the core logic, consider adding tests for gauge and histogram/summary deduplication to ensure consistent behavior across all metric types, especially since histograms and summaries have more complex structures (buckets, quantiles).
src/aiperf/controller/system_controller.py (1)

42-48: Server-metrics status integration is consistent with telemetry flow

Starting ServerMetricsManager as an optional service and handling SERVER_METRICS_STATUS with mirrored endpoint tracking/logging keeps behavior parallel to telemetry and looks correct. If you plan to surface server-metrics endpoint reachability in exports/summary later, you already have the necessary state here to wire in.

Also applies to: 132-138, 186-191, 387-405

docs/tutorials/server-metrics.md (1)

146-155: Minor doc polish and markdownlint fix

Content and examples line up well with the implementation. A couple of small nits you might want to address:

Around Line 153: “then new record written (marks start of new period)” – you could shorten to “then the new record (marks start of new period)” to satisfy the style checker.

Around Line 273: “at one point in time” can be simplified to “at one time” or just “at one point”.

Lines 220–226: there’s a blank > line inside the blockquote, which triggers markdownlint MD028; removing the empty quoted line will keep the callout but satisfy the linter.

Also applies to: 219-226, 271-275
src/aiperf/post_processors/server_metrics_export_results_processor.py (1)
44-67: Consider avoiding unnecessary metadata extraction and clarifying update semantics

The per-endpoint metadata tracking and atomic file writes look solid, and the dedupe key/value choice (endpoint URL + metrics dict) matches the documented behavior.

Two minor suggestions:

record.extract_metadata() is computed for every record, even when _should_update_metadata returns False. Since _should_update_metadata only inspects record.metrics, you can defer extract_metadata() until should_write is True to avoid extra Pydantic object construction:
-        url = record.endpoint_url
-        metadata = record.extract_metadata()
+        url = record.endpoint_url
@@
-        if should_write:
+        if should_write:
+            metadata = record.extract_metadata()
             async with self._metadata_file_lock:
The comment above _metadata_file_model notes that “metadata fields can occasionally change,” but _should_update_metadata only reacts to new metric names, not changes to existing schemas or info metrics. That’s probably fine in practice; consider tightening the comment or, if you do care about changing descriptions/types, extending _should_update_metadata to compare existing vs new MetricSchema/InfoMetricData entries.

Also applies to: 71-104, 109-134
src/aiperf/server_metrics/server_metrics_manager.py (2)

102-161: Configure phase and reachability handling are robust, with minor lint nits

The PROFILE_CONFIGURE flow (building collectors, checking reachability, emitting a status, and bailing early when none are reachable) is clean and matches the documented behavior. Two small nits:

message (Line 104) is unused but required by @on_command. If Ruff’s ARG002 is noisy, consider renaming to _message or adding a # noqa: ARG002 on the line.

The broad except Exception as e around reachability (Lines 148-150) is defensively reasonable here, but if you want to satisfy BLE001 you could either narrow to aiohttp.ClientError/OSError or document the intentional broad catch with a # noqa: BLE001 comment.

Functionally this segment is fine as-is.

170-205: Record + status publishing behavior is correct; broad exception catches are intentional but noisy

The callbacks and status sender behave as expected:

_on_server_metrics_records only emits messages when records is non-empty and on failure sends a second message carrying ErrorDetails, which aligns with ServerMetricsRecordsMessage.valid/has_data.

_on_server_metrics_error correctly wraps collector errors into a dedicated records message with records=[].

_send_server_metrics_status centralizes status reporting for both disabled and enabled cases.

The repeated except Exception as e blocks (Lines 191-193, 252-253, 280-295, 318-320, 351-352) are reasonable for shutdown/telemetry paths, but they’re exactly what Ruff’s BLE001 is flagging. If you want to quiet that without changing behavior, you can:

Add # noqa: BLE001 on those except lines, or

Narrow a couple of them where you know the likely failure class (e.g., messaging client errors).

No functional issues here; this is purely about linter noise vs. defensive coding.

Also applies to: 236-255, 270-295, 296-320, 321-352

src/aiperf/common/models/server_metrics_models.py (1)

75-123: Slim conversion and metadata extraction match the intended server-metrics format

MetricSample.to_slim, ServerMetricsRecord.to_slim, and extract_metadata correctly:

Represent histograms/summaries via nested dicts plus sum/count.

Exclude _info metrics from slim records while capturing them in info_metrics with full label data.

Build metric_schemas with PrometheusMetricType and descriptions as used by the export processor tests.

The implementation is straightforward and lines up with the JSONL + metadata expectations in the post-processor tests. Only tiny nit: the docstring in ServerMetricsRecord.to_slim mentions “array-based format,” but the actual structure is “list of slim samples with dict-based histogram/summary fields,” which you may want to clarify later.

Also applies to: 160-257, 259-291
tests/unit/post_processors/test_server_metrics_export_results_processor.py (1)
380-405: Remove unused monkeypatch fixture parameter

In test_unique_label_values_respects_cardinality_limit the monkeypatch argument is never used, which triggers Ruff’s ARG002. Since the test doesn’t rely on it, you can safely drop the parameter.
-    async def test_unique_label_values_respects_cardinality_limit(
-        self,
-        user_config_server_metrics_export: UserConfig,
-        service_config: ServiceConfig,
-        monkeypatch,
-    ):
+    async def test_unique_label_values_respects_cardinality_limit(
+        self,
+        user_config_server_metrics_export: UserConfig,
+        service_config: ServiceConfig,
+    ):
src/aiperf/server_metrics/server_metrics_data_collector.py (1)
26-66: Collector initialization correctly hooks into the base mixin; consider is None checks for intervals

The constructor wires endpoint_url, collection_interval, reachability_timeout, and callbacks into BaseMetricsCollectorMixin as expected, using environment defaults when explicit values aren’t provided. Functionally this is fine.

If you ever need to support a zero-second interval or timeout explicitly, you might prefer if collection_interval is None / if reachability_timeout is None instead of or, to avoid treating 0.0 as “use default.” For current usage (only None vs. positive defaults) this is just an ergonomics tweak.
-        super().__init__(
-            endpoint_url=endpoint_url,
-            collection_interval=collection_interval
-            or Environment.SERVER_METRICS.COLLECTION_INTERVAL,
-            reachability_timeout=reachability_timeout
-            or Environment.SERVER_METRICS.REACHABILITY_TIMEOUT,
+        super().__init__(
+            endpoint_url=endpoint_url,
+            collection_interval=(
+                collection_interval
+                if collection_interval is not None
+                else Environment.SERVER_METRICS.COLLECTION_INTERVAL
+            ),
+            reachability_timeout=(
+                reachability_timeout
+                if reachability_timeout is not None
+                else Environment.SERVER_METRICS.REACHABILITY_TIMEOUT
+            ),
src/aiperf/records/records_manager.py (2)

69-103: Shared ErrorTrackingState is a reasonable abstraction; tiny typing nit only

Introducing ErrorTrackingState and reusing it via TelemetryTrackingState, ServerMetricsTrackingState, and MetricTrackingState nicely unifies error counting across subsystems.

Minor nit: the field is annotated as dict[ErrorDetails, int] but initialized with defaultdict(int). That’s perfectly fine at runtime but might show up in strict type-checking; if it does, you can either:

Change the annotation to collections.abc.MutableMapping[ErrorDetails, int], or

Wrap the defaultdict in dict(...) when returning summaries (as you already iterate via .items()).

Functionally, this segment is good.

271-290: Server-metrics record handling and processor error tracking look correct; consider exposing a summary later

The new flow for server metrics:

_on_server_metrics_records consumes ServerMetricsRecordsMessage, forwarding each ServerMetricsRecord to _send_server_metrics_to_results_processors when message.valid and counting collection errors otherwise.

_send_server_metrics_to_results_processors fans out to all registered server-metrics processors with asyncio.gather(..., return_exceptions=True), logs any processor exceptions, and increments _server_metrics_state.error_counts using ErrorDetails.from_exception.

This mirrors the telemetry path and gives good resilience against individual processor failures.

Right now, the accumulated server-metrics error counts are never surfaced (unlike telemetry via get_telemetry_error_summary). If you want symmetry and better observability for server-metrics failures, a future follow-up could add a get_server_metrics_error_summary() method or fold these into an appropriate results/metadata export. Not a blocker for this PR.

Also applies to: 398-420

src/aiperf/common/mixins/base_metrics_collector_mixin.py (5)

21-28: Consider using concrete type aliases instead of TypeVar-bound callbacks

TRecordCallback / TErrorCallback are TypeVars that aren’t part of the class’ Generic[...] parameters, which most type checkers treat as effectively non-generic. A simple alias like RecordCallback = Callable[[list[TRecord], str], Awaitable[None]] and ErrorCallback = Callable[[ErrorDetails, str], Awaitable[None]] would be clearer and avoid confusing template parameters.

84-96: Session timeout choice: connect-only timeout may allow indefinitely hanging scrapes

Using ClientTimeout(total=None, connect=self._reachability_timeout) intentionally leaves no overall timeout on GET calls. If a metrics endpoint stalls after connection (e.g., slow or wedged server), _fetch_metrics_text can hang until cancellation. If that’s not desired, consider also setting a sock_read or total timeout (possibly configurable) to bound worst-case collection latency.

108-151: Reachability semantics: only HTTP 200 treated as “reachable”

is_url_reachable currently returns True only for status 200 from either HEAD or GET. That will treat common cases like 3xx redirects, 401/403 auth failures, or 204 responses as unreachable, even though the endpoint exists. If the goal is “endpoint is up” rather than “scrape is currently succeeding”, consider broadening to 2xx or even < 500 depending on how you intend to gate startup/health decisions.

153-178: Background task error handling: broad Exception catch is intentional but could log more detail

Catching Exception around _collect_and_process_metrics() and the error_callback ensures that one bad scrape or callback won’t kill the background task, which is appropriate here. To improve debuggability, you might:

Log with exc_info=True in the else branch (self.error(f"...", exc_info=True)), and

Optionally include exc_info=True when logging callback_error as well.

That keeps robustness while giving full tracebacks when things go wrong.

225-235: Callback failure handling is robust; consider whether to surface structured error info

Silently catching all Exception from record_callback and just logging is appropriate to keep the collector running. If downstreams ever need to react to callback failures (e.g., to emit an ErrorDetails via error_callback or metrics), this would be the place to add that wiring. As-is, the implementation is safe and adequate.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 89b6a45 and 24f1751.

📒 Files selected for processing (44)

README.md (1 hunks)
docs/cli_options.md (1 hunks)
docs/tutorials/server-metrics.md (1 hunks)
src/aiperf/common/config/config_defaults.py (1 hunks)
src/aiperf/common/config/groups.py (1 hunks)
src/aiperf/common/config/output_config.py (4 hunks)
src/aiperf/common/config/user_config.py (2 hunks)
src/aiperf/common/duplicate_tracker.py (1 hunks)
src/aiperf/common/enums/__init__.py (2 hunks)
src/aiperf/common/enums/message_enums.py (1 hunks)
src/aiperf/common/enums/post_processor_enums.py (1 hunks)
src/aiperf/common/enums/prometheus_enums.py (1 hunks)
src/aiperf/common/enums/service_enums.py (1 hunks)
src/aiperf/common/environment.py (4 hunks)
src/aiperf/common/messages/__init__.py (2 hunks)
src/aiperf/common/messages/server_metrics_messages.py (1 hunks)
src/aiperf/common/metric_utils.py (1 hunks)
src/aiperf/common/mixins/__init__.py (3 hunks)
src/aiperf/common/mixins/base_metrics_collector_mixin.py (1 hunks)
src/aiperf/common/mixins/buffered_jsonl_writer_mixin.py (4 hunks)
src/aiperf/common/models/__init__.py (4 hunks)
src/aiperf/common/models/server_metrics_models.py (1 hunks)
src/aiperf/common/protocols.py (2 hunks)
src/aiperf/controller/system_controller.py (4 hunks)
src/aiperf/gpu_telemetry/telemetry_data_collector.py (4 hunks)
src/aiperf/post_processors/__init__.py (2 hunks)
src/aiperf/post_processors/server_metrics_export_results_processor.py (1 hunks)
src/aiperf/post_processors/telemetry_export_results_processor.py (3 hunks)
src/aiperf/records/__init__.py (1 hunks)
src/aiperf/records/records_manager.py (6 hunks)
src/aiperf/server_metrics/__init__.py (1 hunks)
src/aiperf/server_metrics/server_metrics_data_collector.py (1 hunks)
src/aiperf/server_metrics/server_metrics_manager.py (1 hunks)
tests/unit/common/test_duplicate_tracker.py (1 hunks)
tests/unit/common/test_metric_utils.py (1 hunks)
tests/unit/gpu_telemetry/test_telemetry_data_collector.py (8 hunks)
tests/unit/post_processors/test_server_metrics_export_results_processor.py (1 hunks)
tests/unit/post_processors/test_telemetry_export_results_processor.py (6 hunks)
tests/unit/server_metrics/__init__.py (1 hunks)
tests/unit/server_metrics/conftest.py (1 hunks)
tests/unit/server_metrics/test_malformed_metrics_validation.py (1 hunks)
tests/unit/server_metrics/test_server_metrics_data_collector.py (1 hunks)
tests/unit/server_metrics/test_server_metrics_manager.py (1 hunks)
tests/unit/server_metrics/test_server_metrics_models.py (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-10-15T03:24:10.758Z

Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 359
File: aiperf/metrics/types/time_to_first_output_metric.py:0-0
Timestamp: 2025-10-15T03:24:10.758Z
Learning: In TimeToFirstOutputMetric and similar metrics, invalid timestamp scenarios (where response timestamps precede request start) are automatically caught by the base class validation through the record.valid property, which checks that start_perf_ns < end_perf_ns. This validation happens in _require_valid_record before _parse_record is called, so explicit timestamp validation in _parse_record may be redundant.

Applied to files:

tests/unit/server_metrics/test_malformed_metrics_validation.py

📚 Learning: 2025-10-17T16:52:54.523Z

Learnt from: ilana-n
Repo: ai-dynamo/aiperf PR: 372
File: aiperf/gpu_telemetry/telemetry_manager.py:204-206
Timestamp: 2025-10-17T16:52:54.523Z
Learning: In `aiperf/gpu_telemetry/telemetry_data_collector.py`, the `TelemetryDataCollector.initialize()` and `start()` methods perform only synchronous, non-blocking operations. `initialize()` creates an `aiohttp.ClientSession` object (instant construction) and `start()` schedules background tasks without awaiting them (instant scheduling via asyncio). There is no risk of indefinite hangs, so timeouts are not needed when calling these methods.

Applied to files:

tests/unit/gpu_telemetry/test_telemetry_data_collector.py
src/aiperf/gpu_telemetry/telemetry_data_collector.py
src/aiperf/common/mixins/base_metrics_collector_mixin.py

🧬 Code graph analysis (33)

tests/unit/common/test_duplicate_tracker.py (1)

src/aiperf/common/duplicate_tracker.py (3)

AsyncKeyedDuplicateTracker (15-136)

deduplicate_record (67-111)

flush_remaining_duplicates (113-136)

src/aiperf/common/messages/server_metrics_messages.py (4)

src/aiperf/common/enums/message_enums.py (1)

MessageType (7-53)

src/aiperf/common/messages/service_messages.py (1)

BaseServiceMessage (13-20)

src/aiperf/common/models/error_models.py (1)

ErrorDetails (12-76)

src/aiperf/common/models/server_metrics_models.py (1)

ServerMetricsRecord (214-291)

tests/unit/post_processors/test_telemetry_export_results_processor.py (1)

src/aiperf/common/models/telemetry_models.py (1)

TelemetryMetrics (13-45)

tests/unit/common/test_metric_utils.py (1)

src/aiperf/common/metric_utils.py (2)

build_hostname_aware_prometheus_endpoints (41-78)

normalize_metrics_endpoint_url (6-38)

src/aiperf/post_processors/server_metrics_export_results_processor.py (7)

src/aiperf/common/config/user_config.py (1)

UserConfig (34-448)

src/aiperf/common/enums/post_processor_enums.py (1)

ResultsProcessorType (22-50)

src/aiperf/common/factories.py (1)

ResultsProcessorFactory (572-592)

src/aiperf/common/mixins/buffered_jsonl_writer_mixin.py (3)

BufferedJSONLWriterMixinWithDeduplication (193-228)

buffered_write (75-110)

buffered_write (216-220)

src/aiperf/common/models/server_metrics_models.py (7)

ServerMetricsMetadata (182-198)

ServerMetricsMetadataFile (201-211)

ServerMetricsRecord (214-291)

ServerMetricsSlimRecord (160-179)

extract_metadata (259-291)

to_slim (92-122)

to_slim (234-257)

src/aiperf/common/protocols.py (9)

ServerMetricsResultsProcessorProtocol (547-574)

info (70-70)

process_server_metrics_record (558-566)

metadata (394-394)

metadata (620-620)

error (74-74)

summarize (543-543)

summarize (568-574)

summarize (594-601)

src/aiperf/post_processors/telemetry_export_results_processor.py (1)

summarize (68-70)

tests/unit/server_metrics/test_server_metrics_data_collector.py (6)

src/aiperf/common/enums/prometheus_enums.py (1)

PrometheusMetricType (10-47)

src/aiperf/common/models/error_models.py (1)

ErrorDetails (12-76)

src/aiperf/common/models/server_metrics_models.py (1)

ServerMetricsRecord (214-291)

src/aiperf/server_metrics/server_metrics_data_collector.py (2)

ServerMetricsDataCollector (26-270)

_parse_metrics_to_records (86-156)

src/aiperf/common/mixins/base_metrics_collector_mixin.py (5)

endpoint_url (75-77)

collection_interval (80-82)

is_url_reachable (108-127)

_send_records_via_callback (225-235)

_collect_metrics_task (154-177)

tests/unit/server_metrics/conftest.py (1)

sample_prometheus_metrics (17-35)

src/aiperf/common/mixins/buffered_jsonl_writer_mixin.py (3)

src/aiperf/common/duplicate_tracker.py (3)

AsyncKeyedDuplicateTracker (15-136)

deduplicate_record (67-111)

flush_remaining_duplicates (113-136)

src/aiperf/common/protocols.py (2)

exception (75-75)

debug (69-69)

src/aiperf/common/hooks.py (1)

on_stop (271-288)

src/aiperf/post_processors/telemetry_export_results_processor.py (2)

src/aiperf/common/mixins/buffered_jsonl_writer_mixin.py (1)

BufferedJSONLWriterMixinWithDeduplication (193-228)

src/aiperf/common/models/telemetry_models.py (1)

TelemetryRecord (48-86)

tests/unit/server_metrics/test_malformed_metrics_validation.py (1)

src/aiperf/server_metrics/server_metrics_data_collector.py (2)

ServerMetricsDataCollector (26-270)

_parse_metrics_to_records (86-156)

tests/unit/server_metrics/test_server_metrics_models.py (3)

src/aiperf/common/enums/prometheus_enums.py (1)

PrometheusMetricType (10-47)

src/aiperf/common/models/server_metrics_models.py (9)

HistogramData (10-19)

MetricFamily (125-132)

MetricSample (75-122)

MetricSchema (135-143)

ServerMetricsRecord (214-291)

SlimMetricSample (34-72)

SummaryData (22-31)

to_slim (92-122)

to_slim (234-257)

tests/unit/server_metrics/conftest.py (3)

sample_counter_metric (39-50)

sample_histogram_metric (69-84)

sample_server_metrics_record (107-122)

src/aiperf/server_metrics/server_metrics_manager.py (8)

src/aiperf/common/base_component_service.py (1)

BaseComponentService (32-133)

src/aiperf/common/config/user_config.py (2)

UserConfig (34-448)

server_metrics_urls (333-335)

src/aiperf/common/messages/server_metrics_messages.py (2)

ServerMetricsRecordsMessage (11-40)

ServerMetricsStatusMessage (43-62)

src/aiperf/common/metric_utils.py (2)

build_hostname_aware_prometheus_endpoints (41-78)

normalize_metrics_endpoint_url (6-38)

src/aiperf/common/models/error_models.py (2)

ErrorDetails (12-76)

from_exception (66-76)

src/aiperf/common/protocols.py (10)

PushClientProtocol (156-157)

ServiceProtocol (512-524)

debug (69-69)

error (74-74)

stop (122-122)

initialize (119-119)

start (120-120)

warning (72-72)

push (157-157)

publish (143-143)

src/aiperf/server_metrics/server_metrics_data_collector.py (1)

ServerMetricsDataCollector (26-270)

src/aiperf/common/mixins/base_metrics_collector_mixin.py (3)

endpoint_url (75-77)

collection_interval (80-82)

is_url_reachable (108-127)

src/aiperf/common/messages/__init__.py (1)

src/aiperf/common/messages/server_metrics_messages.py (2)

ServerMetricsRecordsMessage (11-40)

ServerMetricsStatusMessage (43-62)

src/aiperf/controller/system_controller.py (5)

src/aiperf/common/messages/server_metrics_messages.py (1)

ServerMetricsStatusMessage (43-62)

src/aiperf/common/protocols.py (1)

run_service (484-486)

src/aiperf/common/enums/service_enums.py (1)

ServiceType (32-48)

src/aiperf/common/hooks.py (1)

on_message (313-334)

src/aiperf/common/enums/message_enums.py (1)

MessageType (7-53)

tests/unit/gpu_telemetry/test_telemetry_data_collector.py (1)

src/aiperf/common/mixins/base_metrics_collector_mixin.py (4)

endpoint_url (75-77)

collection_interval (80-82)

_fetch_metrics_text (190-223)

_collect_metrics_task (154-177)

tests/unit/server_metrics/test_server_metrics_manager.py (7)

src/aiperf/common/config/user_config.py (1)

UserConfig (34-448)

src/aiperf/common/messages/server_metrics_messages.py (1)

ServerMetricsRecordsMessage (11-40)

src/aiperf/common/models/error_models.py (2)

ErrorDetails (12-76)

from_exception (66-76)

src/aiperf/common/models/server_metrics_models.py (1)

ServerMetricsRecord (214-291)

src/aiperf/server_metrics/server_metrics_manager.py (5)

_profile_configure_command (103-168)

_on_start_profiling (171-203)

_on_server_metrics_records (255-294)

_on_server_metrics_error (296-319)

_stop_all_collectors (236-253)

src/aiperf/common/mixins/base_metrics_collector_mixin.py (2)

is_url_reachable (108-127)

endpoint_url (75-77)

src/aiperf/common/protocols.py (4)

initialize (119-119)

start (120-120)

push (157-157)

stop (122-122)

tests/unit/server_metrics/conftest.py (2)

src/aiperf/common/enums/prometheus_enums.py (1)

PrometheusMetricType (10-47)

src/aiperf/common/models/server_metrics_models.py (5)

HistogramData (10-19)

MetricFamily (125-132)

MetricSample (75-122)

ServerMetricsRecord (214-291)

SummaryData (22-31)

src/aiperf/post_processors/__init__.py (1)

src/aiperf/post_processors/server_metrics_export_results_processor.py (1)

ServerMetricsExportResultsProcessor (27-166)

src/aiperf/common/models/__init__.py (1)

src/aiperf/common/models/server_metrics_models.py (11)

HistogramData (10-19)

InfoMetricData (146-157)

MetricFamily (125-132)

MetricSample (75-122)

MetricSchema (135-143)

ServerMetricsMetadata (182-198)

ServerMetricsMetadataFile (201-211)

ServerMetricsRecord (214-291)

ServerMetricsSlimRecord (160-179)

SlimMetricSample (34-72)

SummaryData (22-31)

src/aiperf/common/config/user_config.py (4)

src/aiperf/common/config/config_validators.py (1)

parse_str_or_list (18-44)

src/aiperf/common/config/cli_parameter.py (1)

CLIParameter (7-16)

src/aiperf/common/config/groups.py (1)

Groups (6-31)

src/aiperf/common/metric_utils.py (1)

normalize_metrics_endpoint_url (6-38)

src/aiperf/server_metrics/__init__.py (2)

src/aiperf/server_metrics/server_metrics_data_collector.py (1)

ServerMetricsDataCollector (26-270)

src/aiperf/server_metrics/server_metrics_manager.py (1)

ServerMetricsManager (45-352)

src/aiperf/common/enums/__init__.py (1)

src/aiperf/common/enums/prometheus_enums.py (1)

PrometheusMetricType (10-47)

src/aiperf/server_metrics/server_metrics_data_collector.py (6)

tests/unit/utils/time_traveler.py (3)

time (48-49)

perf_counter_ns (54-55)

time_ns (45-46)

src/aiperf/common/enums/prometheus_enums.py (1)

PrometheusMetricType (10-47)

src/aiperf/common/mixins/base_metrics_collector_mixin.py (6)

BaseMetricsCollectorMixin (31-235)

endpoint_url (75-77)

collection_interval (80-82)

_collect_and_process_metrics (180-188)

_fetch_metrics_text (190-223)

_send_records_via_callback (225-235)

src/aiperf/common/models/error_models.py (1)

ErrorDetails (12-76)

src/aiperf/common/models/server_metrics_models.py (5)

HistogramData (10-19)

MetricFamily (125-132)

MetricSample (75-122)

ServerMetricsRecord (214-291)

SummaryData (22-31)

src/aiperf/common/protocols.py (2)

warning (72-72)

debug (69-69)

tests/unit/post_processors/test_server_metrics_export_results_processor.py (1)

src/aiperf/common/config/output_config.py (3)

OutputConfig (17-174)

server_metrics_export_jsonl_file (169-170)

server_metrics_metadata_json_file (173-174)

src/aiperf/common/models/server_metrics_models.py (2)

src/aiperf/common/enums/prometheus_enums.py (1)

PrometheusMetricType (10-47)

src/aiperf/common/models/base_models.py (1)

AIPerfBaseModel (8-19)

src/aiperf/common/environment.py (1)

src/aiperf/common/config/config_validators.py (1)

parse_str_or_csv_list (47-72)

src/aiperf/gpu_telemetry/telemetry_data_collector.py (3)

src/aiperf/common/mixins/base_metrics_collector_mixin.py (6)

BaseMetricsCollectorMixin (31-235)

collection_interval (80-82)

endpoint_url (75-77)

_collect_and_process_metrics (180-188)

_fetch_metrics_text (190-223)

_send_records_via_callback (225-235)

src/aiperf/common/models/telemetry_models.py (2)

TelemetryMetrics (13-45)

TelemetryRecord (48-86)

src/aiperf/server_metrics/server_metrics_data_collector.py (2)

_collect_and_process_metrics (67-84)

_parse_metrics_to_records (86-156)

src/aiperf/common/mixins/__init__.py (2)

src/aiperf/common/mixins/base_metrics_collector_mixin.py (1)

BaseMetricsCollectorMixin (31-235)

src/aiperf/common/mixins/buffered_jsonl_writer_mixin.py (2)

BufferedJSONLWriterMixin (21-190)

BufferedJSONLWriterMixinWithDeduplication (193-228)

src/aiperf/common/protocols.py (2)

src/aiperf/common/models/server_metrics_models.py (1)

ServerMetricsRecord (214-291)

src/aiperf/post_processors/server_metrics_export_results_processor.py (2)

process_server_metrics_record (71-107)

summarize (160-166)

src/aiperf/common/mixins/base_metrics_collector_mixin.py (7)

src/aiperf/common/hooks.py (3)

background_task (165-203)

on_init (231-248)

on_stop (271-288)

src/aiperf/common/mixins/aiperf_lifecycle_mixin.py (1)

AIPerfLifecycleMixin (37-344)

src/aiperf/common/models/error_models.py (2)

ErrorDetails (12-76)

from_exception (66-76)

tests/unit/gpu_telemetry/test_telemetry_integration.py (2)

record_callback (96-98)

error_callback (100-102)

src/aiperf/gpu_telemetry/telemetry_data_collector.py (1)

_collect_and_process_metrics (62-77)

src/aiperf/server_metrics/server_metrics_data_collector.py (1)

_collect_and_process_metrics (67-84)

src/aiperf/common/protocols.py (1)

error (74-74)

src/aiperf/common/duplicate_tracker.py (1)

src/aiperf/common/protocols.py (1)

trace (68-68)

src/aiperf/records/records_manager.py (8)

src/aiperf/common/messages/telemetry_messages.py (2)

TelemetryRecordsMessage (17-42)

valid (39-42)

src/aiperf/common/messages/server_metrics_messages.py (2)

ServerMetricsRecordsMessage (11-40)

valid (33-35)

src/aiperf/common/models/error_models.py (3)

ErrorDetails (12-76)

ErrorDetailsCount (103-110)

from_exception (66-76)

src/aiperf/common/models/server_metrics_models.py (1)

ServerMetricsRecord (214-291)

src/aiperf/common/protocols.py (5)

ResultsProcessorProtocol (537-543)

ServerMetricsResultsProcessorProtocol (547-574)

error (74-74)

process_server_metrics_record (558-566)

exception (75-75)

src/aiperf/common/hooks.py (1)

on_pull_message (367-387)

src/aiperf/common/enums/message_enums.py (1)

MessageType (7-53)

src/aiperf/post_processors/server_metrics_export_results_processor.py (1)

process_server_metrics_record (71-107)

src/aiperf/records/__init__.py (1)

src/aiperf/records/records_manager.py (3)

ErrorTrackingState (70-80)

RecordsManager (107-800)

TelemetryTrackingState (84-97)

src/aiperf/common/config/output_config.py (1)

src/aiperf/common/config/config_defaults.py (1)

OutputDefaults (135-155)

🪛 LanguageTool

docs/tutorials/server-metrics.md

[style] ~151-~151: ‘new record’ might be wordy. Consider a shorter alternative.
Context: ...ate written (marks end of period), then new record written (marks start of new period) **...

(EN_WORDINESS_PREMIUM_NEW_RECORD)

[style] ~273-~273: This phrase is redundant. Consider writing “point” or “time”.
Context: ...ng ALL metrics from one endpoint at one point in time. Example from Dynamo frontend: ``...

(MOMENT_IN_TIME)

🪛 markdownlint-cli2 (0.18.1)

docs/cli_options.md

415-415: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4

(MD001, heading-increment)

417-417: Bare URL used

(MD034, no-bare-urls)

417-417: Bare URL used

(MD034, no-bare-urls)

docs/tutorials/server-metrics.md

219-219: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🪛 Ruff (0.14.5)

src/aiperf/common/metric_utils.py

33-33: Avoid specifying long messages outside the exception class

(TRY003)

tests/unit/common/test_duplicate_tracker.py

122-122: This suppression comment is invalid because it cannot be on its own line

Remove this comment

(RUF028)

tests/unit/common/test_metric_utils.py

16-16: This suppression comment is invalid because it cannot be in an expression, pattern, argument list, or other non-statement

Remove this comment

(RUF028)

src/aiperf/server_metrics/server_metrics_manager.py

104-104: Unused method argument: message

(ARG002)

148-148: Do not catch blind exception: Exception

(BLE001)

171-171: Unused method argument: message

(ARG002)

191-191: Do not catch blind exception: Exception

(BLE001)

207-207: Unused method argument: message

(ARG002)

252-252: Do not catch blind exception: Exception

(BLE001)

280-280: Do not catch blind exception: Exception

(BLE001)

291-291: Do not catch blind exception: Exception

(BLE001)

318-318: Do not catch blind exception: Exception

(BLE001)

351-351: Do not catch blind exception: Exception

(BLE001)

src/aiperf/common/config/user_config.py

308-308: Mutable class attributes should be annotated with typing.ClassVar

(RUF012)

tests/unit/post_processors/test_server_metrics_export_results_processor.py

384-384: Unused method argument: monkeypatch

(ARG002)

src/aiperf/common/mixins/base_metrics_collector_mixin.py

167-167: Do not catch blind exception: Exception

(BLE001)

174-174: Do not catch blind exception: Exception

(BLE001)

210-210: Avoid specifying long messages outside the exception class

(TRY003)

234-234: Do not catch blind exception: Exception

(BLE001)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (16)

GitHub Check: build (ubuntu-latest, 3.13)
GitHub Check: build (macos-latest, 3.12)
GitHub Check: build (ubuntu-latest, 3.12)
GitHub Check: build (ubuntu-latest, 3.10)
GitHub Check: build (macos-latest, 3.13)
GitHub Check: build (macos-latest, 3.10)
GitHub Check: integration-tests (macos-latest, 3.12)
GitHub Check: build (macos-latest, 3.11)
GitHub Check: integration-tests (macos-latest, 3.13)
GitHub Check: integration-tests (macos-latest, 3.11)
GitHub Check: integration-tests (ubuntu-latest, 3.10)
GitHub Check: integration-tests (ubuntu-latest, 3.11)
GitHub Check: integration-tests (macos-latest, 3.10)
GitHub Check: build (ubuntu-latest, 3.11)
GitHub Check: integration-tests (ubuntu-latest, 3.12)
GitHub Check: integration-tests (ubuntu-latest, 3.13)

src/aiperf/common/mixins/base_metrics_collector_mixin.py

src/aiperf/common/mixins/buffered_jsonl_writer_mixin.py

tests/unit/common/test_duplicate_tracker.py

tests/unit/common/test_metric_utils.py

docs/tutorials/server-metrics.md

docs/cli_options.md

ilana-n

awesome!!! I tried it out on my end and works great :D

src/aiperf/common/enums/service_enums.py

feat: Collect server-side prometheus metrics

24f1751

ajcasagrande requested a review from ilana-n November 21, 2025 20:38

github-actions bot added the feat label Nov 21, 2025

coderabbitai bot reviewed Nov 21, 2025

View reviewed changes

ilana-n reviewed Nov 21, 2025

View reviewed changes

docs/tutorials/server-metrics.md Show resolved Hide resolved

ilana-n reviewed Nov 21, 2025

View reviewed changes

docs/cli_options.md Show resolved Hide resolved

ilana-n approved these changes Nov 22, 2025

View reviewed changes

src/aiperf/common/enums/service_enums.py Show resolved Hide resolved

ajcasagrande added 3 commits December 3, 2025 16:59

address code rabbit

5a094ad

Merge remote-tracking branch 'origin/main' into ajc/server-metrics-slim2

228b84d

update env vars docs

e89cb2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Collect server-side prometheus metrics #488

feat: Collect server-side prometheus metrics #488

Uh oh!

ajcasagrande commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

codecov bot commented Nov 21, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Nov 21, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilana-n left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Collect server-side prometheus metrics #488

Are you sure you want to change the base?

feat: Collect server-side prometheus metrics #488

Uh oh!

Conversation

ajcasagrande commented Nov 21, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

New Features

Documentation

Uh oh!

github-actions bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

Uh oh!

codecov bot commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot commented Nov 21, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ilana-n left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ajcasagrande commented Nov 21, 2025 •

edited by coderabbitai bot

Loading

github-actions bot commented Nov 21, 2025 •

edited

Loading

codecov bot commented Nov 21, 2025 •

edited

Loading