-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendSpeculative Decoding<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafter<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafterfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Description
🚀 The feature, motivation and pitch
Support Eagle (specifically, Eagle-1) speculative decoding in AutoDeploy.
This will continue the work started in #9147 , to extend the range of speculative decoding setups supported by AutoDeploy. Just like #9147, we are working in the two-model regime, where we have separate "target" and "draft" models, but this time the draft model will be an Eagle-style module, which reads hidden states from the target as part of its input.
Goal: Infrastructure changes made for this should carry over to supporting MTPEagle speculative decoding in AutoDeploy. Needs some additional verification of MTPEagle-related code in TRT-LLM.
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
AutoDeploy<NV> AutoDeploy Backend<NV> AutoDeploy BackendSpeculative Decoding<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafter<NV>MTP/Eagle/Medusa/Lookahead/Prompt-Lookup-Decoding/Draft-Target-Model/ReDrafterfeature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality support
Type
Projects
Status
Ready