You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We need to create a multi-token prediction module (same as what DeepSeek v3 used) that can plug into architectures such as DeepSeek v3 etc. and allow for multi-token prediction training with an arbitrary level of blocks. (default =4).
This should be a modular architecture in that it clones the existing block and appends to the neck of the model and auto-integrates for training.