NVIDIA · h-guo18 · Mar 29, 2026 · Mar 30, 2026 · Mar 30, 2026 · Mar 30, 2026
@@ -73,14 +73,16 @@ This one-line command runs a minimal example workflow of training and exporting
 For small base models that fit in GPU memory, we can collocate them with draft models and train with the following command:
 
 ```bash
-./launch_train.sh --model $BASE_MODEL \
-            --output_dir $OUTPUT_DIR \
-            --data input_conversations/train.jsonl  \
-            --num_epochs $NUM_EPOCH \
-            --eagle_config eagle_config.json
+./launch_train.sh \
+    --config ../../modelopt_recipes/general/speculative_decoding/eagle3.yaml \
+    model.model_name_or_path=meta-llama/Llama-3.2-1B \
+    data.data_path=input_conversations/train.jsonl \
+    training.output_dir=ckpts/llama-3.2-1b-online
 ```
 
-FSDP2 is used by default. To enable context parallelism for long-context training, specify `--cp_size n`.
+All default training settings live in `eagle3.yaml`; override any field via OmegaConf dotlist arguments on the command line.
+
+To enable context parallelism for long-context training, add `training.cp_size=<N>` to the overrides.
 The saved modelopt checkpoint is similar in architecture to HF models. It can be further optimized through **ModelOpt**, e.g., PTQ and QAT.
 
 ## Training Draft Model with Offline Base Model
@@ -113,15 +115,14 @@ python collect_hidden_states/compute_hidden_states_hf.py \
 
 ### Train Draft Model with Dumped Hidden States
 
-Once we finish dumping hidden states, launch offline training with an extra `--offline-data` argument:
+Once we finish dumping hidden states, launch offline training pointing to the hidden states directory:
 
 ```bash
-./launch_train.sh --model $BASE_MODEL \
-            --output_dir $OUTPUT_DIR \
-            --data $DATA \
-            --num_epochs $NUM_EPOCH \
-            --eagle_config eagle_config.json \
-            --offline-data $HIDDEN_STATES_DIR
+./launch_train.sh \
+    --config ../../modelopt_recipes/general/speculative_decoding/eagle3.yaml \
+    model.model_name_or_path=meta-llama/Llama-3.2-1B \
+    data.offline_data_path=$HIDDEN_STATES_DIR \
+    training.output_dir=ckpts/llama-3.2-1b-offline
 ```
 
 ## Model Validation
@@ -244,13 +245,13 @@ For large scale data generation, please see [SLURM prepare data](SLURM_prepare_d
 
 ### Configuring Draft Model
 
-For EAGLE‑1 and EAGLE‑3 we provide a [default model architecture config](https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/speculative/config.py#L37) in ModelOpt. You can override default settings by providing an additional JSON dict. E.g. To use 2-layer eagle with 8192 intermediate size for MLP, set `eagle_config.json` to:
+For EAGLE‑1 and EAGLE‑3 we provide a [default model architecture config](https://github.com/NVIDIA/Model-Optimizer/blob/main/modelopt/torch/speculative/config.py#L37) in ModelOpt. You can override default settings via `eagle.eagle_architecture_config` in the YAML. E.g. to use a 2-layer EAGLE head with 8192 intermediate size:
 
-```json
-{
-    "num_hidden_layers": 2,
-    "intermediate_size":8192
-}
+```yaml
+eagle:
+  eagle_architecture_config:
+    num_hidden_layers: 2
+    intermediate_size: 8192
 ```
 
 ### Draft Vocabulary Compression
@@ -263,61 +264,26 @@ python scripts/calibrate_draft_vocab.py --model meta-llama/Llama-3.2-1B-Instruct
 
 This will produce a `d2t.pt` file in `save_dir`, which is the mapping from draft token to target token. During inference, draft tokens can be mapped back to target tokens by `target_token = draft_token + d2t[draft_token]`.
 
-Then, simply set `{"draft_vocab_size":32000}` in `eagle_config.json` and include `--draft_vocab_cache <path_to_d2t.pt>` when running `./launch_train.sh`. The draft model will use this provided vocab table during training and export.
+Then, set `eagle_architecture_config.draft_vocab_size: 32000` and `data.draft_vocab_cache: <path_to_d2t.pt>` in your YAML. The draft model will use this provided vocab table during training and export.
-Then, set `eagle_architecture_config.draft_vocab_size: 32000` and `data.draft_vocab_cache: <path_to_d2t.pt>` in your YAML. The draft model will use this provided vocab table during training and export.
+Then, set `eagle.eagle_architecture_config.draft_vocab_size: 32000` and `data.draft_vocab_cache: <path_to_d2t.pt>` in your YAML. The draft model will use this provided vocab table during training and export.
-Then, set `eagle_architecture_config.draft_vocab_size: 32000` and `data.draft_vocab_cache: <path_to_d2t.pt>` in your YAML. The draft model will use this provided vocab table during training and export.
+Then, set `eagle.eagle_architecture_config.draft_vocab_size: 32000` and `data.draft_vocab_cache: <path_to_d2t.pt>` in your YAML. The draft model will use this provided vocab table during training and export.
 
 ### Interact with `modelopt.torch.speculative`
 
-`main.py` provides an example for converting a HF base model for speculative decoding and training it. It consists of a few simple steps:
-First, load the base model and tokenizer from Hugging Face:
-
-```python
-model = transformers.AutoModelForCausalLM.from_pretrained(
-    "<path to your pretrained model>"
-)
-```
-
-Then, load default eagle config and make necessary overwrites:
+`main.py` provides a complete example for converting a HF base model for speculative decoding and training it. The core steps are loading the base model, converting it with an eagle config dict, and training with HF Trainer:
 
 ```python
-# Load default config
-config = {
-    "eagle1": EAGLE1_DEFAULT_CFG,
-    "eagle3": EAGLE3_DEFAULT_CFG,
-}[training_args.mode]["config"]
-
-# overwrite config with custom config
-config["eagle_architecture_config"].update({"<overwrite_keys>": "<overwrite_values>"})
-
-# Mandatory: hidden size, vocab size and max position embeddings must match base model
-config["eagle_architecture_config"].update(
-    {
-        "hidden_size": model.config.hidden_size,
-        "vocab_size": model.config.vocab_size,
-        "max_position_embeddings": model.config.max_position_embeddings,
-    }
-)
-```
+import modelopt.torch.speculative as mtsp
 
-Then, we convert model to a speculative decoding model:
+# Convert base model in-place to an EAGLE speculative decoding model
+eagle_cfg = {"eagle_decoder_type": "llama", ...}  # fields from EagleConfig
+mtsp.convert(model, [("eagle", eagle_cfg)])
 
-```python
-mtsp.convert(model, [("eagle", config)])
+# Train with HF Trainer as usual
+trainer = transformers.Trainer(model=model, ...)
+trainer.train()
+trainer.save_model("<output_dir>")
 ```
 
-This will modify the model in-place with eagle training forward, making it compatible with HF trainer:
-
-```python
-# Create a trainer
-trainer = transformers.Trainer(model=model, tokenizer=tokenizer, args=training_args, **data_module)
-trainer._move_model_to_device(model, trainer.args.device)
-
-# Enable HF checkpointing so that the saved model will contain the speculative decoding module
-mto.enable_huggingface_checkpointing()
-
-trainer.train(resume_from_checkpoint=checkpoint)
-trainer.save_state()
-trainer.save_model("<path to the output directory>")
-```
+See `main.py` for the full example including tokenizer setup, dataset loading, and checkpoint handling.
 
 ## Support Matrix