Hi, thank you for the great work on this project!
I encountered an issue when performing online training using the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 accelerate launch main.py \
--train_dataset_path datasets/train_harmhelp.hf \
--peft_name 'logs_trl/ric_assistant_harmlesshelpful_offline20000/model_iter0' \
--exp_type 'assistant' \
--reward_names 'harmless,helpful' \
--training_steps 0 \
--online_training_steps 4000 \
--num_online_iterations 2 \
--wandb_name 'ric_assistant_harmlesshelpful_offline20000_onlineiter2' \
--batch_size 2 \
--load_in_8bit True
During the first few steps of online training, I observed the following logs:
{'loss': 1.0795, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 7.9724, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.01}
{'loss': 7.9587, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 7.965, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.03}
{'loss': 7.8317, 'grad_norm': nan, 'learning_rate': 1e-05, 'epoch': 0.04}
You can see that grad_norm quickly becomes NaN, and the loss suddenly spikes and remains high.
Do you have any idea what might be causing this?
Any suggestions this would be greatly appreciated!
Thanks again!
Hi, thank you for the great work on this project!
I encountered an issue when performing online training using the following command:
During the first few steps of online training, I observed the following logs:
You can see that grad_norm quickly becomes NaN, and the loss suddenly spikes and remains high.
Do you have any idea what might be causing this?
Any suggestions this would be greatly appreciated!
Thanks again!