You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to leverage perf_metrics to obtain granular performance data on prefill throughput with LLM.
Section 2.2 of this paper has discussion which suggests we should take input_token / ttft to measure latency before decode phase begins. I am wondering if this is correct and if it would make sense as a feature.
savvadesogle and MaximProshinsavvadesoglesavvadesoglesavvadesogle