Measuring prefill throughput with Qwen3-4B-2507-int4 on A770

Hello,

I am trying to leverage ```perf_metrics``` to obtain granular performance data on prefill throughput with LLM.

Section 2.2 of  [this paper](https://arxiv.org/pdf/2404.14294v3) has discussion which suggests we should take **input_token / ttft** to measure latency before decode phase begins. I am wondering if this is correct and if it would make sense as a feature.


<img width="906" height="486" alt="Image" src="https://github.com/user-attachments/assets/cca2af8f-b38e-4e20-b509-8a42328e31c5" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Measuring prefill throughput with Qwen3-4B-2507-int4 on A770 #2788

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Measuring prefill throughput with Qwen3-4B-2507-int4 on A770 #2788

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions