Skip to content

Training Draft for Large models (e.g 70B) #593

@Ofir408

Description

@Ofir408

Hi,
I'm training an eagle3 draft model for speculative decoding for a large model such as Llama-70B:
bash train_eagle3_and_export.sh \ --base_model /opt/ml/model/base_models/Llama3.3-70B-instruct \ --data /opt/ml/model/data/train.jsonl \ --num_gpu 8
From what I understand, this requires using the offline training mode. However, when I tried training without saving hidden states, I ran into OOM issues. It looks like ~97% of the Llama-70B parameters remain trainable even after converting to an Eagle draft model.

How is this expected? does the final conversion into an Eagle model happens after training, using the export_hf_checkpoint.py script. Does this mean the model remains mostly unfrozen during training?

Additionally, I attempted dumping hidden states to disk using the run_hf_compute_hiddens_dp script you provided, but the process is extremely slow — around 4 days for only 120K examples on 8×A100 80GB.

Is there a more efficient workflow or recommended approach for training a draft model for Llama-70B?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionHelp is is needed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions