gradient explosion in TAPT DAPT pretraining

Hi, I am trying to reproduce the results of AdaptSum and met problem when pretaing the model in the TAPT setting. It worked quite well for the science and debate datasets, where the data size is small. However, when I trained TAPT for social media, the loss function got exploded:

I run pretraining with command:
python ./src/tapt_pretraining.py -path=./dataset/'social media'/TAPT-data/train.source \
                                -dm='social media' \
                                -visible_gpu=1 \
                                -save_interval=1000 \
                                 -recadam \
                                 -logging_Euclid_dist

and the training process witnesses the loss exploding to NaN:
(Epoch 0) LOSS: 2.291335 Euclid dist: 322.301648        13%  1999/15089  [17:55<1:47:14,    2.03it/s]
(Epoch 0) LOSS: 2.246833 Euclid dist: 959.653581        20%  2999/15089  [26:46<1:39:52,    2.01it/s]
(Epoch 0) LOSS: 9.272711 Euclid dist: 1541903563718079518205927655211008.00000        33%  3999/15089  [35:40<1:46:22,    1.74it/s]
(Epoch 0) LOSS: nan Euclid dist: nan        40%  4999/15089  [44:14<1:21:29,    2.16it/s]
(Epoch 0) LOSS: nan Euclid dist: nan        46%  5999/15089  [52:34<1:10:48,    1.80it/s]
(Epoch 0) LOSS: nan Euclid dist: nan        53%  6999/15089  [1:01:15<1:14:45,    1.49it/s]

I tried to lower learning rate to 0.01 and adjust the gradient clip value, it put the time of loss explosion later, but didn't solve the problem. Am I missing something or doing it wrong? What should I do in order to control the model?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gradient explosion in TAPT DAPT pretraining #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

gradient explosion in TAPT DAPT pretraining #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions