Online Training Causes grad_norm = NaN and Loss Explosion on Generated Data

Hi, thank you for the great work on this project!

I encountered an issue when performing online training using the following command:


```
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch main.py \
  --train_dataset_path datasets/train_harmhelp.hf \
  --peft_name 'logs_trl/ric_assistant_harmlesshelpful_offline20000/model_iter0' \
  --exp_type 'assistant' \
  --reward_names 'harmless,helpful' \
  --training_steps 0 \
  --online_training_steps 4000 \
  --num_online_iterations 2 \
  --wandb_name 'ric_assistant_harmlesshelpful_offline20000_onlineiter2' \
  --batch_size 2 \
  --load_in_8bit True
```

During the first few steps of online training, I observed the following logs:

```
{'loss': 1.0795, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 7.9724, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 7.9587, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 7.965, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.03}
{'loss': 7.8317, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.04}
```

You can see that grad_norm quickly becomes NaN, and the loss suddenly spikes and remains high.

Do you have any idea what might be causing this?

Any suggestions this would be greatly appreciated!

Thanks again!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Online Training Causes grad_norm = NaN and Loss Explosion on Generated Data #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Online Training Causes grad_norm = NaN and Loss Explosion on Generated Data #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions