Skip to content

[Feature] Add Multi-Token Prediction module #933

@lessw2020

Description

@lessw2020

We need to create a multi-token prediction module (same as what DeepSeek v3 used) that can plug into architectures such as DeepSeek v3 etc. and allow for multi-token prediction training with an arbitrary level of blocks. (default =4).
This should be a modular architecture in that it clones the existing block and appends to the neck of the model and auto-integrates for training.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions