-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backendfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Description
🚀 The feature, motivation and pitch
from @2ez4bz. Seems that the first request in trtllm-serve or dynamo is much slower than subsequent requests. From logging in dynamo @2ez4bz seems to be able to track it down to intitial calls to flashinfer.
@lucaslie and @2ez4bz discussed whether it may be flashinfer prefill that triggers it since AD doens't do any warmup for prefill
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy Backendfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Type
Projects
Status
Backlog