Skip to content

HuggingFace warnings / errors on experiment runs -e.g., "... model 'OptimizedModule' is not supported ..." #670

@mmartin9684-sil

Description

@mmartin9684-sil
Collaborator

HuggingFace is reporting an error at the start of the test step during an experiment run:

[ERROR|base.py:1149] 2025-02-28 12:05:52,437 >> The model 'OptimizedModule' is not supported for . Supported models are ['BartForConditionalGeneration', 'BigBirdPegasusForConditionalGeneration', 'BlenderbotForConditionalGeneration', 'BlenderbotSmallForConditionalGeneration', 'EncoderDecoderModel', 'FSMTForConditionalGeneration', 'GPTSanJapaneseForConditionalGeneration', 'LEDForConditionalGeneration', 'LongT5ForConditionalGeneration', 'M2M100ForConditionalGeneration', 'MarianMTModel', 'MBartForConditionalGeneration', 'MT5ForConditionalGeneration', 'MvpForConditionalGeneration', 'NllbMoeForConditionalGeneration', 'PegasusForConditionalGeneration', 'PegasusXForConditionalGeneration', 'PLBartForConditionalGeneration', 'ProphetNetForConditionalGeneration', 'Qwen2AudioForConditionalGeneration', 'SeamlessM4TForTextToText', 'SeamlessM4Tv2ForTextToText', 'SwitchTransformersForConditionalGeneration', 'T5ForConditionalGeneration', 'UMT5ForConditionalGeneration', 'XLMProphetNetForConditionalGeneration'].

However, the test step appears to work successfully for this experiment despite the error. The model is set to 'facebook/nllb-200-distilled-1.3B'.

Activity

mmartin9684-sil

mmartin9684-sil commented on Mar 3, 2025

@mmartin9684-sil
CollaboratorAuthor

Another potential compatibility issue with the recent HuggingFace updates - this warning is reported at the start of training:

1740758254327 aqua-gpu-dallas:gpu2 DEBUG Encoding train dataset: 100% 9365/9365 [00:00<00:00, 13408.85 examples/s]
/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:449: FutureWarning:

`torch.cuda.amp.GradScaler(args...)` is deprecated. Please use `torch.amp.GradScaler('cuda', args...)` instead.
mmartin9684-sil

mmartin9684-sil commented on Mar 3, 2025

@mmartin9684-sil
CollaboratorAuthor

An additional warning being reported by HF for recent experiments. This warning occurs at the end of preprocessing / start of training.

/usr/local/lib/python3.10/dist-packages/transformers/training_args.py:1568: FutureWarning:

`evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead

2025-02-28 15:57:25,900 - silnlp.common.environment - INFO - Uploading MT/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/effective-config-96ede8fa89.yml
=== Training (Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical) ===
mmartin9684-sil

mmartin9684-sil commented on Mar 3, 2025

@mmartin9684-sil
CollaboratorAuthor

ClearML warning at the start of the training step:

[INFO|integration_utils.py:1774] 2025-02-28 15:57:57,855 >> Automatic ClearML logging enabled.
[INFO|integration_utils.py:1802] 2025-02-28 15:57:57,856 >> ClearML Task has been initialized.
2025-02-28 15:57:57,856 - clearml.Task - WARNING - Parameters must be of builtin type (Transformers/accelerator_config[AcceleratorConfig])
  0% 0/5000 [00:00<?, ?it/s]

1740758286198 aqua-gpu-dallas:gpu2 DEBUG [INFO|trainer.py:2314] 2025-02-28 15:58:01,402 >> ***** Running training *****

mmartin9684-sil

mmartin9684-sil commented on Mar 3, 2025

@mmartin9684-sil
CollaboratorAuthor

Torch warning at the start of training:

1740758286198 aqua-gpu-dallas:gpu2 DEBUG [INFO|trainer.py:2314] 2025-02-28 15:58:01,402 >> ***** Running training *****
[INFO|trainer.py:2315] 2025-02-28 15:58:01,402 >>   Num examples = 9,365
[INFO|trainer.py:2316] 2025-02-28 15:58:01,402 >>   Num Epochs = 35
[INFO|trainer.py:2317] 2025-02-28 15:58:01,402 >>   Instantaneous batch size per device = 64
[INFO|trainer.py:2319] 2025-02-28 15:58:01,402 >>   Training with DataParallel so batch size has been adjusted to: 32
[INFO|trainer.py:2320] 2025-02-28 15:58:01,402 >>   Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:2321] 2025-02-28 15:58:01,402 >>   Gradient Accumulation steps = 2
[INFO|trainer.py:2322] 2025-02-28 15:58:01,402 >>   Total optimization steps = 5,000
[INFO|trainer.py:2323] 2025-02-28 15:58:01,404 >>   Number of trainable parameters = 1,370,638,336

  0% 0/5000 [00:01<?, ?it/s]
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:295: FutureWarning:

`torch.cpu.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cpu', args...)` instead.

mmartin9684-sil

mmartin9684-sil commented on Mar 3, 2025

@mmartin9684-sil
CollaboratorAuthor

Warning at the end of training when the model is being saved:

1740763687635 aqua-gpu-dallas:gpu2 DEBUG {'loss': 3.0325, 'grad_norm': 0.11882693320512772, 'learning_rate': 0.0, 'epoch': 34.19}
100% 5000/5000 [1:29:58<00:00,  1.08it/s]2025-02-28 17:28:02,669 - silnlp.nmt.hugging_face_config - INFO - Saving model checkpoint to /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000 using custom _save function
/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:2817: UserWarning:

Moving the following attributes in the config to the generation config: {'max_length': 200}. You are seeing this warning because you've set generation parameters in the model config, as opposed to in the generation config.

[INFO|configuration_utils.py:414] 2025-02-28 17:28:02,671 >> Configuration saved in /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000/config.json
[INFO|configuration_utils.py:865] 2025-02-28 17:28:02,671 >> Configuration saved in /root/.cache/silnlp/experiments/Cameroon/Yamba/NLLB.1.3B.en-NIV84.yam-YAM2.canonical/run/checkpoint-5000/generation_config.json

bhartmoore

bhartmoore commented on Mar 3, 2025

@bhartmoore
Collaborator

I am seeing this warning during mid-training evals:

[WARNING|trainer.py:761] 2025-03-03 18:15:47,707 >> Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.

bhartmoore

bhartmoore commented on Mar 3, 2025

@bhartmoore
Collaborator

Warning at the start of training, just before the torch.cuda.amp.GradScaler(args...) warning listed above:

2025-03-03 13:08:00  [WARNING|logging.py:328] 2025-03-03 18:07:56,218 >> The new embeddings will be initialized from a multivariate normal distribution that has old embeddings' mean and covariance. As described in this article: https://nlp.stanford.edu/~johnhew/vocab-expansion.html. To disable this, use `mean_resizing=False`
2025-03-03 13:09:19  Encoding train dataset: 100% 6400/6400 [00:00<00:00, 18518.27 examples/s]
davidbaines

davidbaines commented on Mar 21, 2025

@davidbaines
Collaborator

This warning occurs after logging a checkpoint.

[INFO|integration_utils.py:1934] 2025-03-21 14:48:21,216 >> Logging checkpoint artifact checkpoint-1000. This may take some time.
2025-03-21 14:49:10,742 - clearml.model - INFO - No output storage destination defined, registering local model /tmp/model_package.a7cazdje.zip
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:295: FutureWarning:
torch.cpu.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cpu', args...) instead.

It's nice to see how quickly checkpoints are saved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    No branches or pull requests

      Participants

      @ddaspit@davidbaines@mmartin9684-sil@bhartmoore

      Issue actions

        HuggingFace warnings / errors on experiment runs -e.g., "... model 'OptimizedModule' is not supported ..." · Issue #670 · sillsdev/silnlp