-
Notifications
You must be signed in to change notification settings - Fork 582
Closed
Labels
Description
Bug summary
When I try to use dp --pt train input.json --init-frz-model pt_model.pth to init training with the pt model converted from tf model (i.e., by dp convert-backend tf_model.pb pt_model.pth), I get the error of missing keys in the stat_dict. Based on the error log, it should be a general issue for all kind of models rather than the dipole model I tried.
DeePMD-kit Version
v3.1.1-52-gd774edea
Backend and its version
TensorFlow v2.18.0-rc2-4-g6550e4bd802; PyTorch v2.5.1+cu121-ga8d6afb511a
How did you download the software?
Built from source
Input Files, Running Commands, Error Log, etc.
Input files:
Input files are adapted from examples/water_tensor/dipole
Running command:
dp --pt train input.json --init-frz-model ../00.tf_training/dw_model.pth 1>dp_train.stdout 2>dp_train.stderrError Log:
Traceback (most recent call last):
File "/home/jxzhu/apps/miniconda3/envs/deepmd-devel/bin/dp", line 7, in <module>
sys.exit(main())
^^^^^^
File "/home/jxzhu/apps/deepmd/devel/deepmd/main.py", line 1020, in main
deepmd_main(args)
File "/home/jxzhu/apps/miniconda3/envs/deepmd-devel/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/home/jxzhu/apps/deepmd/devel/deepmd/pt/entrypoints/main.py", line 536, in main
train(
File "/home/jxzhu/apps/deepmd/devel/deepmd/pt/entrypoints/main.py", line 346, in train
trainer = get_trainer(
^^^^^^^^^^^^
File "/home/jxzhu/apps/deepmd/devel/deepmd/pt/entrypoints/main.py", line 193, in get_trainer
trainer = training.Trainer(
^^^^^^^^^^^^^^^^^
File "/home/jxzhu/apps/deepmd/devel/deepmd/pt/train/training.py", line 617, in __init__
self.model.load_state_dict(frz_model.state_dict())
File "/home/jxzhu/apps/miniconda3/envs/deepmd-devel/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2584, in load_state_dict
raise RuntimeError(
RuntimeError: Error(s) in loading state_dict for DipoleModel:
Missing key(s) in state_dict: "atomic_model.fitting_net.filter_layers.networks.1.layers.0.matrix", "atomic_model.fitting_net.filter_layers.networks.1.layers.0.bias", "atomic_model.fitting_net.filter_layers.networks.1.layers.1.matrix", "atomic_model.fitting_net.filter_layers.networks.1.layers.1.bias", "atomic_model.fitting_net.filter_layers.networks.1.layers.1.idt", "atomic_model.fitting_net.filter_layers.networks.1.layers.2.matrix", "atomic_model.fitting_net.filter_layers.networks.1.layers.2.bias", "atomic_model.fitting_net.filter_layers.networks.1.layers.2.idt", "atomic_model.fitting_net.filter_layers.networks.1.layers.3.matrix", "atomic_model.fitting_net.filter_layers.networks.1.layers.3.bias", "atomic_model.fitting_net.filter_layers._networks.1.layers.0.matrix", "atomic_model.fitting_net.filter_layers._networks.1.layers.0.bias", "atomic_model.fitting_net.filter_layers._networks.1.layers.1.matrix", "atomic_model.fitting_net.filter_layers._networks.1.layers.1.bias", "atomic_model.fitting_net.filter_layers._networks.1.layers.1.idt", "atomic_model.fitting_net.filter_layers._networks.1.layers.2.matrix", "atomic_model.fitting_net.filter_layers._networks.1.layers.2.bias", "atomic_model.fitting_net.filter_layers._networks.1.layers.2.idt", "atomic_model.fitting_net.filter_layers._networks.1.layers.3.matrix", "atomic_model.fitting_net.filter_layers._networks.1.layers.3.bias".Steps to Reproduce
tar -zxvf init_from_tf2pt_model.tar.gz
cd init_from_tf2pt_model/00.tf_training/
bash run.sh
cd ../01.init_from_frozen_pt/
bash run.sh