Skip to content

Commit aa971a6

Browse files
committed
updated latest metrics addressed feedback
1 parent 13e457d commit aa971a6

File tree

2 files changed

+165
-66
lines changed

2 files changed

+165
-66
lines changed

README.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ SPDX-License-Identifier: Apache-2.0
1212
[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/ai-dynamo/aiperf)
1313

1414

15-
**[Architecture](docs/architecture.md)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Migrating from Genai-Perf](docs/migrating.md)** | **[CLI Options](docs/cli_options.md)** | **[Metrics Reference](docs/metrics_reference.md)** |
15+
**[Architecture](docs/architecture.md)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Migrating from Genai-Perf](docs/migrating.md)** | **[CLI Options](docs/cli_options.md)** | **[Metrics Reference](docs/metrics_reference.md)**
1616

1717

1818
AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution.
@@ -195,12 +195,12 @@ Metrics for token-producing endpoints that track token counts and throughput. Re
195195

196196
| Metric | Tag | Formula | Unit |
197197
|--------|-----|---------|------|
198-
| [**Output Token Count**](docs/metrics_reference.md#output-token-count) | `output_token_count` | `len(tokenizer.encode(content))` | `tokens` |
198+
| [**Output Token Count**](docs/metrics_reference.md#output-token-count) | `output_token_count` | `len(tokenizer.encode(content, add_special_tokens=False))` | `tokens` |
199199
| [**Output Sequence Length (OSL)**](docs/metrics_reference.md#output-sequence-length-osl) | `output_sequence_length` | `(output_token_count or 0) + (reasoning_token_count or 0)` | `tokens` |
200-
| [**Input Sequence Length (ISL)**](docs/metrics_reference.md#input-sequence-length-isl) | `input_sequence_length` | `len(tokenizer.encode(prompt))` | `tokens` |
201-
| [**Total Output Tokens**](docs/metrics_reference.md#total-output-tokens) | `total_output_tokens` | `sum(output_token_count for record in records)` | `tokens` |
202-
| [**Total Output Sequence Length**](docs/metrics_reference.md#total-output-sequence-length) | `total_osl` | `sum(output_sequence_length for record in records)` | `tokens` |
203-
| [**Total Input Sequence Length**](docs/metrics_reference.md#total-input-sequence-length) | `total_isl` | `sum(input_sequence_length for record in records)` | `tokens` |
200+
| [**Input Sequence Length (ISL)**](docs/metrics_reference.md#input-sequence-length-isl) | `input_sequence_length` | `len(tokenizer.encode(prompt, add_special_tokens=False))` | `tokens` |
201+
| [**Total Output Tokens**](docs/metrics_reference.md#total-output-tokens) | `total_output_tokens` | `sum(r.output_token_count for r in records if r.valid)` | `tokens` |
202+
| [**Total Output Sequence Length**](docs/metrics_reference.md#total-output-sequence-length) | `total_osl` | `sum(r.output_sequence_length for r in records if r.valid)` | `tokens` |
203+
| [**Total Input Sequence Length**](docs/metrics_reference.md#total-input-sequence-length) | `total_isl` | `sum(r.input_sequence_length for r in records if r.valid)` | `tokens` |
204204
| [**Output Token Throughput**](docs/metrics_reference.md#output-token-throughput) | `output_token_throughput` | `total_osl / benchmark_duration_seconds` | `tokens/sec` |
205205

206206
### Reasoning Metrics
@@ -209,8 +209,8 @@ Metrics specific to models that support reasoning/thinking tokens. Requires mode
209209

210210
| Metric | Tag | Formula | Unit |
211211
|--------|-----|---------|------|
212-
| [**Reasoning Token Count**](docs/metrics_reference.md#reasoning-token-count) | `reasoning_token_count` | `len(tokenizer.encode(reasoning_content))` | `tokens` |
213-
| [**Total Reasoning Tokens**](docs/metrics_reference.md#total-reasoning-tokens) | `total_reasoning_tokens` | `sum(reasoning_token_count for record in records)` | `tokens` |
212+
| [**Reasoning Token Count**](docs/metrics_reference.md#reasoning-token-count) | `reasoning_token_count` | `len(tokenizer.encode(reasoning_content, add_special_tokens=False))` | `tokens` |
213+
| [**Total Reasoning Tokens**](docs/metrics_reference.md#total-reasoning-tokens) | `total_reasoning_tokens` | `sum(r.reasoning_token_count for r in records if r.valid)` | `tokens` |
214214

215215
### Usage Field Metrics
216216

@@ -222,9 +222,9 @@ Metrics tracking API-reported token counts from the `usage` field in responses.
222222
| [**Usage Completion Tokens**](docs/metrics_reference.md#usage-completion-tokens) | `usage_completion_tokens` | `response.usage.completion_tokens` | `tokens` |
223223
| [**Usage Total Tokens**](docs/metrics_reference.md#usage-total-tokens) | `usage_total_tokens` | `response.usage.total_tokens` | `tokens` |
224224
| [**Usage Reasoning Tokens**](docs/metrics_reference.md#usage-reasoning-tokens) | `usage_reasoning_tokens` | `response.usage.completion_tokens_details.reasoning_tokens` | `tokens` |
225-
| [**Total Usage Prompt Tokens**](docs/metrics_reference.md#total-usage-prompt-tokens) | `total_usage_prompt_tokens` | `sum(usage_prompt_tokens for record in records)` | `tokens` |
226-
| [**Total Usage Completion Tokens**](docs/metrics_reference.md#total-usage-completion-tokens) | `total_usage_completion_tokens` | `sum(usage_completion_tokens for record in records)` | `tokens` |
227-
| [**Total Usage Total Tokens**](docs/metrics_reference.md#total-usage-total-tokens) | `total_usage_total_tokens` | `sum(usage_total_tokens for record in records)` | `tokens` |
225+
| [**Total Usage Prompt Tokens**](docs/metrics_reference.md#total-usage-prompt-tokens) | `total_usage_prompt_tokens` | `sum(r.usage_prompt_tokens for r in records if r.valid)` | `tokens` |
226+
| [**Total Usage Completion Tokens**](docs/metrics_reference.md#total-usage-completion-tokens) | `total_usage_completion_tokens` | `sum(r.usage_completion_tokens for r in records if r.valid)` | `tokens` |
227+
| [**Total Usage Total Tokens**](docs/metrics_reference.md#total-usage-total-tokens) | `total_usage_total_tokens` | `sum(r.usage_total_tokens for r in records if r.valid)` | `tokens` |
228228

229229
### Usage Discrepancy Metrics
230230

@@ -235,15 +235,15 @@ Metrics measuring differences between API-reported and client-computed token cou
235235
| [**Usage Prompt Tokens Diff %**](docs/metrics_reference.md#usage-prompt-tokens-diff-) | `usage_prompt_tokens_diff_pct` | `abs((usage_prompt_tokens - input_sequence_length) / input_sequence_length) * 100` | `%` |
236236
| [**Usage Completion Tokens Diff %**](docs/metrics_reference.md#usage-completion-tokens-diff-) | `usage_completion_tokens_diff_pct` | `abs((usage_completion_tokens - output_sequence_length) / output_sequence_length) * 100` | `%` |
237237
| [**Usage Reasoning Tokens Diff %**](docs/metrics_reference.md#usage-reasoning-tokens-diff-) | `usage_reasoning_tokens_diff_pct` | `abs((usage_reasoning_tokens - reasoning_token_count) / reasoning_token_count) * 100` | `%` |
238-
| [**Usage Discrepancy Count**](docs/metrics_reference.md#usage-discrepancy-count) | `usage_discrepancy_count` | `sum(1 for record if any_diff > threshold)` | `requests` |
238+
| [**Usage Discrepancy Count**](docs/metrics_reference.md#usage-discrepancy-count) | `usage_discrepancy_count` | `sum(1 for r in records if r.any_diff > threshold)` | `requests` |
239239

240240
### Goodput Metrics
241241

242242
Metrics measuring throughput of requests meeting user-defined Service Level Objectives (SLOs).
243243

244244
| Metric | Tag | Formula | Unit |
245245
|--------|-----|---------|------|
246-
| [**Good Request Count**](docs/metrics_reference.md#good-request-count) | `good_request_count` | `sum(1 for record if all_slos_met)` | `requests` |
246+
| [**Good Request Count**](docs/metrics_reference.md#good-request-count) | `good_request_count` | `sum(1 for r in records if r.all_slos_met)` | `requests` |
247247
| [**Goodput**](docs/metrics_reference.md#goodput) | `goodput` | `good_request_count / benchmark_duration_seconds` | `requests/sec` |
248248

249249
### Error Metrics
@@ -253,20 +253,20 @@ Metrics computed for failed/error requests.
253253
| Metric | Tag | Formula | Unit |
254254
|--------|-----|---------|------|
255255
| [**Error Input Sequence Length**](docs/metrics_reference.md#error-input-sequence-length) | `error_isl` | `input_sequence_length` (for error requests) | `tokens` |
256-
| [**Total Error Input Sequence Length**](docs/metrics_reference.md#total-error-input-sequence-length) | `total_error_isl` | `sum(input_sequence_length for record in error_records)` | `tokens` |
257-
| [**Error Request Count**](docs/metrics_reference.md#error-request-count) | `error_request_count` | `sum(1 for record if not record.valid)` | `requests` |
256+
| [**Total Error Input Sequence Length**](docs/metrics_reference.md#total-error-input-sequence-length) | `total_error_isl` | `sum(r.input_sequence_length for r in records if not r.valid)` | `tokens` |
257+
| [**Error Request Count**](docs/metrics_reference.md#error-request-count) | `error_request_count` | `sum(1 for r in records if not r.valid)` | `requests` |
258258

259259
### General Metrics
260260

261261
Metrics available for all benchmark runs with no special requirements.
262262

263263
| Metric | Tag | Formula | Unit |
264264
|--------|-----|---------|------|
265-
| [**Request Latency**](docs/metrics_reference.md#request-latency) | `request_latency` | `responses[-1].perf_ns - start_perf_ns` | `ms` |
265+
| [**Request Latency**](docs/metrics_reference.md#request-latency) | `request_latency` | `responses[-1].perf_ns - request.start_perf_ns` | `ms` |
266266
| [**Request Throughput**](docs/metrics_reference.md#request-throughput) | `request_throughput` | `request_count / benchmark_duration_seconds` | `requests/sec` |
267-
| [**Request Count**](docs/metrics_reference.md#request-count) | `request_count` | `sum(1 for record if record.valid)` | `requests` |
268-
| [**Minimum Request Timestamp**](docs/metrics_reference.md#minimum-request-timestamp) | `min_request_timestamp` | `min(timestamp_ns for record in records)` | `datetime` |
269-
| [**Maximum Response Timestamp**](docs/metrics_reference.md#maximum-response-timestamp) | `max_response_timestamp` | `max(timestamp_ns + request_latency for record in records)` | `datetime` |
267+
| [**Request Count**](docs/metrics_reference.md#request-count) | `request_count` | `sum(1 for r in records if r.valid)` | `requests` |
268+
| [**Minimum Request Timestamp**](docs/metrics_reference.md#minimum-request-timestamp) | `min_request_timestamp` | `min(r.timestamp_ns for r in records)` | `datetime` |
269+
| [**Maximum Response Timestamp**](docs/metrics_reference.md#maximum-response-timestamp) | `max_response_timestamp` | `max(r.timestamp_ns + r.request_latency for r in records)` | `datetime` |
270270
| [**Benchmark Duration**](docs/metrics_reference.md#benchmark-duration) | `benchmark_duration` | `max_response_timestamp - min_request_timestamp` | `sec` |
271271

272272
</br>

0 commit comments

Comments
 (0)