-
Notifications
You must be signed in to change notification settings - Fork 11
docs: add comprehensive metrics documentation #321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -12,7 +12,7 @@ SPDX-License-Identifier: Apache-2.0 | |||||
| [](https://deepwiki.com/ai-dynamo/aiperf) | ||||||
|
|
||||||
|
|
||||||
| **[Architecture](docs/architecture.md)**| **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Migrating from Genai-Perf](docs/migrating.md)** | **[CLI Options](docs/cli_options.md)** | ||||||
| **[Architecture](docs/architecture.md)** | **[Design Proposals](https://github.com/ai-dynamo/enhancements)** | **[Migrating from Genai-Perf](docs/migrating.md)** | **[CLI Options](docs/cli_options.md)** | **[Metrics Reference](docs/metrics_reference.md)** | ||||||
|
|
||||||
|
|
||||||
| AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference solution. | ||||||
|
|
@@ -96,7 +96,6 @@ aiperf profile --benchmark-duration 300.0 --benchmark-grace-period 30.0 [other o | |||||
|
|
||||||
| </br> | ||||||
|
|
||||||
|
|
||||||
| <!-- | ||||||
| ====================== | ||||||
| INSTALLATION | ||||||
|
|
@@ -166,6 +165,113 @@ NVIDIA AIPerf | LLM Metrics | |||||
| </div> | ||||||
|
|
||||||
|
|
||||||
|
|
||||||
| <!-- | ||||||
| ====================== | ||||||
| METRICS REFERENCE | ||||||
| ====================== | ||||||
| --> | ||||||
|
|
||||||
| ## Metrics Reference | ||||||
|
|
||||||
| AIPerf provides comprehensive metrics organized into multiple functional categories. For detailed descriptions, requirements, and nuances of each metric, see the **[Complete Metrics Reference](docs/metrics_reference.md)**. | ||||||
|
|
||||||
| ### Streaming Metrics | ||||||
|
|
||||||
| Metrics specific to streaming requests that measure real-time token generation characteristics. Requires `--streaming` flag. | ||||||
|
|
||||||
| | Metric | Tag | Formula | Unit | | ||||||
| |--------|-----|---------|------| | ||||||
| | [**Time to First Token (TTFT)**](docs/metrics_reference.md#time-to-first-token-ttft) | `ttft` | `responses[0].perf_ns - request.start_perf_ns` | `ms` | | ||||||
| | [**Time to Second Token (TTST)**](docs/metrics_reference.md#time-to-second-token-ttst) | `ttst` | `responses[1].perf_ns - responses[0].perf_ns` | `ms` | | ||||||
| | [**Inter Token Latency (ITL)**](docs/metrics_reference.md#inter-token-latency-itl) | `inter_token_latency` | `(request_latency - ttft) / (output_sequence_length - 1)` | `ms` | | ||||||
| | [**Inter Chunk Latency (ICL)**](docs/metrics_reference.md#inter-chunk-latency-icl) | `inter_chunk_latency` | `[responses[i].perf_ns - responses[i-1].perf_ns for i in range(1, len(responses))]` | `ms` | | ||||||
| | [**Output Token Throughput Per User**](docs/metrics_reference.md#output-token-throughput-per-user) | `output_token_throughput_per_user` | `1.0 / inter_token_latency_seconds` | `tokens/sec/user` | | ||||||
coderabbitai[bot] marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
| | [**Prefill Throughput**](docs/metrics_reference.md#prefill-throughput) | `prefill_throughput` | `input_sequence_length / ttft_seconds` | `tokens/sec` | | ||||||
|
|
||||||
| ### Token Based Metrics | ||||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use hyphen for compound adjective. Line 192 should use "Token-Based Metrics" (hyphenated) for proper grammar. -### Token Based Metrics
+### Token-Based Metrics📝 Committable suggestion
Suggested change
🧰 Tools🪛 LanguageTool[grammar] ~192-~192: Use a hyphen to join words. (QB_NEW_EN_HYPHEN) 🤖 Prompt for AI Agents |
||||||
|
|
||||||
| Metrics for token-producing endpoints that track token counts and throughput. Requires text-generating endpoints (chat, completion, etc.). | ||||||
|
|
||||||
| | Metric | Tag | Formula | Unit | | ||||||
| |--------|-----|---------|------| | ||||||
| | [**Output Token Count**](docs/metrics_reference.md#output-token-count) | `output_token_count` | `len(tokenizer.encode(content, add_special_tokens=False))` | `tokens` | | ||||||
| | [**Output Sequence Length (OSL)**](docs/metrics_reference.md#output-sequence-length-osl) | `output_sequence_length` | `(output_token_count or 0) + (reasoning_token_count or 0)` | `tokens` | | ||||||
| | [**Input Sequence Length (ISL)**](docs/metrics_reference.md#input-sequence-length-isl) | `input_sequence_length` | `len(tokenizer.encode(prompt, add_special_tokens=False))` | `tokens` | | ||||||
| | [**Total Output Tokens**](docs/metrics_reference.md#total-output-tokens) | `total_output_tokens` | `sum(r.output_token_count for r in records if r.valid)` | `tokens` | | ||||||
| | [**Total Output Sequence Length**](docs/metrics_reference.md#total-output-sequence-length) | `total_osl` | `sum(r.output_sequence_length for r in records if r.valid)` | `tokens` | | ||||||
| | [**Total Input Sequence Length**](docs/metrics_reference.md#total-input-sequence-length) | `total_isl` | `sum(r.input_sequence_length for r in records if r.valid)` | `tokens` | | ||||||
| | [**Output Token Throughput**](docs/metrics_reference.md#output-token-throughput) | `output_token_throughput` | `total_osl / benchmark_duration_seconds` | `tokens/sec` | | ||||||
|
|
||||||
| ### Reasoning Metrics | ||||||
|
|
||||||
| Metrics specific to models that support reasoning/thinking tokens. Requires models with separate `reasoning_content` field. | ||||||
|
|
||||||
| | Metric | Tag | Formula | Unit | | ||||||
| |--------|-----|---------|------| | ||||||
| | [**Reasoning Token Count**](docs/metrics_reference.md#reasoning-token-count) | `reasoning_token_count` | `len(tokenizer.encode(reasoning_content, add_special_tokens=False))` | `tokens` | | ||||||
| | [**Total Reasoning Tokens**](docs/metrics_reference.md#total-reasoning-tokens) | `total_reasoning_tokens` | `sum(r.reasoning_token_count for r in records if r.valid)` | `tokens` | | ||||||
|
|
||||||
| ### Usage Field Metrics | ||||||
|
|
||||||
| Metrics tracking API-reported token counts from the `usage` field in responses. Useful for comparing client-side vs server-side token counts. | ||||||
|
|
||||||
| | Metric | Tag | Formula | Unit | | ||||||
| |--------|-----|---------|------| | ||||||
| | [**Usage Prompt Tokens**](docs/metrics_reference.md#usage-prompt-tokens) | `usage_prompt_tokens` | `response.usage.prompt_tokens` | `tokens` | | ||||||
| | [**Usage Completion Tokens**](docs/metrics_reference.md#usage-completion-tokens) | `usage_completion_tokens` | `response.usage.completion_tokens` | `tokens` | | ||||||
| | [**Usage Total Tokens**](docs/metrics_reference.md#usage-total-tokens) | `usage_total_tokens` | `response.usage.total_tokens` | `tokens` | | ||||||
| | [**Usage Reasoning Tokens**](docs/metrics_reference.md#usage-reasoning-tokens) | `usage_reasoning_tokens` | `response.usage.completion_tokens_details.reasoning_tokens` | `tokens` | | ||||||
| | [**Total Usage Prompt Tokens**](docs/metrics_reference.md#total-usage-prompt-tokens) | `total_usage_prompt_tokens` | `sum(r.usage_prompt_tokens for r in records if r.valid)` | `tokens` | | ||||||
| | [**Total Usage Completion Tokens**](docs/metrics_reference.md#total-usage-completion-tokens) | `total_usage_completion_tokens` | `sum(r.usage_completion_tokens for r in records if r.valid)` | `tokens` | | ||||||
| | [**Total Usage Total Tokens**](docs/metrics_reference.md#total-usage-total-tokens) | `total_usage_total_tokens` | `sum(r.usage_total_tokens for r in records if r.valid)` | `tokens` | | ||||||
|
|
||||||
| ### Usage Discrepancy Metrics | ||||||
|
|
||||||
| Metrics measuring differences between API-reported and client-computed token counts. | ||||||
|
|
||||||
| | Metric | Tag | Formula | Unit | | ||||||
| |--------|-----|---------|------| | ||||||
| | [**Usage Prompt Tokens Diff %**](docs/metrics_reference.md#usage-prompt-tokens-diff-) | `usage_prompt_tokens_diff_pct` | `abs((usage_prompt_tokens - input_sequence_length) / input_sequence_length) * 100` | `%` | | ||||||
| | [**Usage Completion Tokens Diff %**](docs/metrics_reference.md#usage-completion-tokens-diff-) | `usage_completion_tokens_diff_pct` | `abs((usage_completion_tokens - output_sequence_length) / output_sequence_length) * 100` | `%` | | ||||||
| | [**Usage Reasoning Tokens Diff %**](docs/metrics_reference.md#usage-reasoning-tokens-diff-) | `usage_reasoning_tokens_diff_pct` | `abs((usage_reasoning_tokens - reasoning_token_count) / reasoning_token_count) * 100` | `%` | | ||||||
| | [**Usage Discrepancy Count**](docs/metrics_reference.md#usage-discrepancy-count) | `usage_discrepancy_count` | `sum(1 for r in records if r.any_diff > threshold)` | `requests` | | ||||||
|
|
||||||
| ### Goodput Metrics | ||||||
|
|
||||||
| Metrics measuring throughput of requests meeting user-defined Service Level Objectives (SLOs). | ||||||
|
|
||||||
| | Metric | Tag | Formula | Unit | | ||||||
| |--------|-----|---------|------| | ||||||
| | [**Good Request Count**](docs/metrics_reference.md#good-request-count) | `good_request_count` | `sum(1 for r in records if r.all_slos_met)` | `requests` | | ||||||
| | [**Goodput**](docs/metrics_reference.md#goodput) | `goodput` | `good_request_count / benchmark_duration_seconds` | `requests/sec` | | ||||||
|
|
||||||
| ### Error Metrics | ||||||
|
|
||||||
| Metrics computed for failed/error requests. | ||||||
|
|
||||||
| | Metric | Tag | Formula | Unit | | ||||||
| |--------|-----|---------|------| | ||||||
| | [**Error Input Sequence Length**](docs/metrics_reference.md#error-input-sequence-length) | `error_isl` | `input_sequence_length` (for error requests) | `tokens` | | ||||||
| | [**Total Error Input Sequence Length**](docs/metrics_reference.md#total-error-input-sequence-length) | `total_error_isl` | `sum(r.input_sequence_length for r in records if not r.valid)` | `tokens` | | ||||||
| | [**Error Request Count**](docs/metrics_reference.md#error-request-count) | `error_request_count` | `sum(1 for r in records if not r.valid)` | `requests` | | ||||||
|
|
||||||
| ### General Metrics | ||||||
|
|
||||||
| Metrics available for all benchmark runs with no special requirements. | ||||||
|
|
||||||
| | Metric | Tag | Formula | Unit | | ||||||
| |--------|-----|---------|------| | ||||||
| | [**Request Latency**](docs/metrics_reference.md#request-latency) | `request_latency` | `responses[-1].perf_ns - request.start_perf_ns` | `ms` | | ||||||
| | [**Request Throughput**](docs/metrics_reference.md#request-throughput) | `request_throughput` | `request_count / benchmark_duration_seconds` | `requests/sec` | | ||||||
| | [**Request Count**](docs/metrics_reference.md#request-count) | `request_count` | `sum(1 for r in records if r.valid)` | `requests` | | ||||||
| | [**Minimum Request Timestamp**](docs/metrics_reference.md#minimum-request-timestamp) | `min_request_timestamp` | `min(r.timestamp_ns for r in records)` | `datetime` | | ||||||
| | [**Maximum Response Timestamp**](docs/metrics_reference.md#maximum-response-timestamp) | `max_response_timestamp` | `max(r.timestamp_ns + r.request_latency for r in records)` | `datetime` | | ||||||
| | [**Benchmark Duration**](docs/metrics_reference.md#benchmark-duration) | `benchmark_duration` | `max_response_timestamp - min_request_timestamp` | `sec` | | ||||||
|
|
||||||
| </br> | ||||||
|
|
||||||
|
|
||||||
| ## Known Issues | ||||||
|
|
||||||
| - Output sequence length constraints (`--output-tokens-mean`) cannot be guaranteed unless you pass `ignore_eos` and/or `min_tokens` via `--extra-inputs` to an inference server that supports them. | ||||||
|
|
||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a mention that responses are "chunks with non-empty content"