-
Notifications
You must be signed in to change notification settings - Fork 198
Description
Hi,
I'm training an eagle3 draft model for speculative decoding for a large model such as Llama-70B:
bash train_eagle3_and_export.sh \ --base_model /opt/ml/model/base_models/Llama3.3-70B-instruct \ --data /opt/ml/model/data/train.jsonl \ --num_gpu 8
From what I understand, this requires using the offline training mode. However, when I tried training without saving hidden states, I ran into OOM issues. It looks like ~97% of the Llama-70B parameters remain trainable even after converting to an Eagle draft model.
How is this expected? does the final conversion into an Eagle model happens after training, using the export_hf_checkpoint.py script. Does this mean the model remains mostly unfrozen during training?
Additionally, I attempted dumping hidden states to disk using the run_hf_compute_hiddens_dp script you provided, but the process is extremely slow — around 4 days for only 120K examples on 8×A100 80GB.
Is there a more efficient workflow or recommended approach for training a draft model for Llama-70B?
Thanks!