Skip to content

Online Training Causes grad_norm = NaN and Loss Explosion on Generated Data #20

@biaoliu-kiritsugu

Description

@biaoliu-kiritsugu

Hi, thank you for the great work on this project!

I encountered an issue when performing online training using the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch main.py \
  --train_dataset_path datasets/train_harmhelp.hf \
  --peft_name 'logs_trl/ric_assistant_harmlesshelpful_offline20000/model_iter0' \
  --exp_type 'assistant' \
  --reward_names 'harmless,helpful' \
  --training_steps 0 \
  --online_training_steps 4000 \
  --num_online_iterations 2 \
  --wandb_name 'ric_assistant_harmlesshelpful_offline20000_onlineiter2' \
  --batch_size 2 \
  --load_in_8bit True

During the first few steps of online training, I observed the following logs:

{'loss': 1.0795, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 7.9724, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 7.9587, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 7.965, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.03}
{'loss': 7.8317, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.04}

You can see that grad_norm quickly becomes NaN, and the loss suddenly spikes and remains high.

Do you have any idea what might be causing this?

Any suggestions this would be greatly appreciated!

Thanks again!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions