You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[**Output Sequence Length (OSL)**](docs/metrics_reference.md#output-sequence-length-osl)|`output_sequence_length`|`(output_token_count or 0) + (reasoning_token_count or 0)`|`tokens`|
|[**Total Output Tokens**](docs/metrics_reference.md#total-output-tokens)|`total_output_tokens`|`sum(output_token_count for record in records)`|`tokens`|
202
-
|[**Total Output Sequence Length**](docs/metrics_reference.md#total-output-sequence-length)|`total_osl`|`sum(output_sequence_length for record in records)`|`tokens`|
203
-
|[**Total Input Sequence Length**](docs/metrics_reference.md#total-input-sequence-length)|`total_isl`|`sum(input_sequence_length for record in records)`|`tokens`|
|[**Total Output Tokens**](docs/metrics_reference.md#total-output-tokens)|`total_output_tokens`|`sum(r.output_token_count for r in records if r.valid)`|`tokens`|
202
+
|[**Total Output Sequence Length**](docs/metrics_reference.md#total-output-sequence-length)|`total_osl`|`sum(r.output_sequence_length for r in records if r.valid)`|`tokens`|
203
+
|[**Total Input Sequence Length**](docs/metrics_reference.md#total-input-sequence-length)|`total_isl`|`sum(r.input_sequence_length for r in records if r.valid)`|`tokens`|
|[**Total Reasoning Tokens**](docs/metrics_reference.md#total-reasoning-tokens)|`total_reasoning_tokens`|`sum(reasoning_token_count for record in records)`|`tokens`|
|[**Total Reasoning Tokens**](docs/metrics_reference.md#total-reasoning-tokens)|`total_reasoning_tokens`|`sum(r.reasoning_token_count for r in records if r.valid)`|`tokens`|
214
214
215
215
### Usage Field Metrics
216
216
@@ -222,9 +222,9 @@ Metrics tracking API-reported token counts from the `usage` field in responses.
|[**Total Usage Prompt Tokens**](docs/metrics_reference.md#total-usage-prompt-tokens)|`total_usage_prompt_tokens`|`sum(usage_prompt_tokens for record in records)`|`tokens`|
226
-
|[**Total Usage Completion Tokens**](docs/metrics_reference.md#total-usage-completion-tokens)|`total_usage_completion_tokens`|`sum(usage_completion_tokens for record in records)`|`tokens`|
227
-
|[**Total Usage Total Tokens**](docs/metrics_reference.md#total-usage-total-tokens)|`total_usage_total_tokens`|`sum(usage_total_tokens for record in records)`|`tokens`|
225
+
|[**Total Usage Prompt Tokens**](docs/metrics_reference.md#total-usage-prompt-tokens)|`total_usage_prompt_tokens`|`sum(r.usage_prompt_tokens for r in records if r.valid)`|`tokens`|
226
+
|[**Total Usage Completion Tokens**](docs/metrics_reference.md#total-usage-completion-tokens)|`total_usage_completion_tokens`|`sum(r.usage_completion_tokens for r in records if r.valid)`|`tokens`|
227
+
|[**Total Usage Total Tokens**](docs/metrics_reference.md#total-usage-total-tokens)|`total_usage_total_tokens`|`sum(r.usage_total_tokens for r in records if r.valid)`|`tokens`|
228
228
229
229
### Usage Discrepancy Metrics
230
230
@@ -235,15 +235,15 @@ Metrics measuring differences between API-reported and client-computed token cou
|[**Usage Discrepancy Count**](docs/metrics_reference.md#usage-discrepancy-count)|`usage_discrepancy_count`|`sum(1 for record if any_diff > threshold)`|`requests`|
238
+
|[**Usage Discrepancy Count**](docs/metrics_reference.md#usage-discrepancy-count)|`usage_discrepancy_count`|`sum(1 for r in records if r.any_diff > threshold)`|`requests`|
239
239
240
240
### Goodput Metrics
241
241
242
242
Metrics measuring throughput of requests meeting user-defined Service Level Objectives (SLOs).
243
243
244
244
| Metric | Tag | Formula | Unit |
245
245
|--------|-----|---------|------|
246
-
|[**Good Request Count**](docs/metrics_reference.md#good-request-count)|`good_request_count`|`sum(1 for record if all_slos_met)`|`requests`|
246
+
|[**Good Request Count**](docs/metrics_reference.md#good-request-count)|`good_request_count`|`sum(1 for r in records if r.all_slos_met)`|`requests`|
|[**Total Error Input Sequence Length**](docs/metrics_reference.md#total-error-input-sequence-length)|`total_error_isl`|`sum(input_sequence_length for record in error_records)`|`tokens`|
257
-
|[**Error Request Count**](docs/metrics_reference.md#error-request-count)|`error_request_count`|`sum(1 for record if not record.valid)`|`requests`|
256
+
|[**Total Error Input Sequence Length**](docs/metrics_reference.md#total-error-input-sequence-length)|`total_error_isl`|`sum(r.input_sequence_length for r in records if not r.valid)`|`tokens`|
257
+
|[**Error Request Count**](docs/metrics_reference.md#error-request-count)|`error_request_count`|`sum(1 for r in records if not r.valid)`|`requests`|
258
258
259
259
### General Metrics
260
260
261
261
Metrics available for all benchmark runs with no special requirements.
|[**Request Count**](docs/metrics_reference.md#request-count)|`request_count`|`sum(1 for record if record.valid)`|`requests`|
268
-
|[**Minimum Request Timestamp**](docs/metrics_reference.md#minimum-request-timestamp)|`min_request_timestamp`|`min(timestamp_ns for record in records)`|`datetime`|
269
-
|[**Maximum Response Timestamp**](docs/metrics_reference.md#maximum-response-timestamp)|`max_response_timestamp`|`max(timestamp_ns + request_latency for record in records)`|`datetime`|
267
+
|[**Request Count**](docs/metrics_reference.md#request-count)|`request_count`|`sum(1 for r in records if r.valid)`|`requests`|
268
+
|[**Minimum Request Timestamp**](docs/metrics_reference.md#minimum-request-timestamp)|`min_request_timestamp`|`min(r.timestamp_ns for r in records)`|`datetime`|
269
+
|[**Maximum Response Timestamp**](docs/metrics_reference.md#maximum-response-timestamp)|`max_response_timestamp`|`max(r.timestamp_ns + r.request_latency for r in records)`|`datetime`|
0 commit comments