Skip to content

Measuring prefill throughput with Qwen3-4B-2507-int4 on A770 #2788

@SearchSavior

Description

@SearchSavior

Hello,

I am trying to leverage perf_metrics to obtain granular performance data on prefill throughput with LLM.

Section 2.2 of this paper has discussion which suggests we should take input_token / ttft to measure latency before decode phase begins. I am wondering if this is correct and if it would make sense as a feature.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions