[Question] Unexpectedly low prefill (TTFT) latency ratio

Hello,

I'm testing the Qwen2.5-1.5B model with openvino.genai and observing PerfMetrics that seem counter-intuitive.

The Time to First Token (TTFT) accounts for a very small fraction of the total Generate call duration, even with a large prompt.

Observed Metrics
### Case 1 (Short Prompt)

Input: 32 tokens

Output: ~130 tokens

Time to First Token: 110,097 us

Total Generate Duration: 20,228,544 us

Ratio (TTFT/Total): ~0.54%

### Case 2 (Long Prompt)

Input: ~1024 tokens

Output: ~1024 tokens

Time to First Token: 850,237 us

Total Generate Duration: 121,392,448 us

Ratio (TTFT/Total): ~0.70%

### Question
Is this behavior expected? A prefill latency of less than 1% for a 1K token prompt seems unusually low, suggesting either the decode stage is disproportionately slow or Time to First Token isn't capturing the full prefill cost.

Could you please confirm if these metrics are reasonable or if I might be misinterpreting the data?

### Environment Details
Hardware: Ultra 258V

OS: Ubuntu 24.04

OpenVINO Version: 2025.2.0

Model Precision: INT8

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question] Unexpectedly low prefill (TTFT) latency ratio #3021

Case 1 (Short Prompt)

Case 2 (Long Prompt)

Question

Environment Details

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question] Unexpectedly low prefill (TTFT) latency ratio #3021

Description

Case 1 (Short Prompt)

Case 2 (Long Prompt)

Question

Environment Details

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions