-
Notifications
You must be signed in to change notification settings - Fork 484
Description
Version
1.0.1
On which installation method(s) does this occur?
Pip
Describe the issue
Dear StormCast Teams,
I am attempting to train the StormCast using the HRRR and ERA5 datasets over central US following your paper.
With the help of released training scripts, we have train the UNet model smoothly. The training loss decreases remarkably. However, we have encountered issues when training the diffusion model -- the training loss remains nearly constant over hundreds of epochs.
We used the same training scripts for both the UNet and diffusion models. But before training the diffusion model, we modified the following hyperparameters relative to the UNet configuration:
in $StormCast/config/training/default.yaml:
outdir: ‘diffusion_model’
loss: 'edm'
in $StormCast/config/model/stormcast.yaml:
model_name: 'diffusion'
use_regression_net: True
regression_weights: $StormCast/UNet/checkpoints/StormCastUNet.0.520.mdlus
in $StormCast/config/diffusion.yaml:
# Diffusion model specific changes
model:
use_regression_net: True
regression_weights: "$StormCast/UNet /checkpoints/StormCastUNet.0.520.mdlus"
previous_step_conditioning: True
spatial_pos_embed: True
training:
loss: 'edm'
Could you please kindly let us know if we may have missed any key steps or configurations in setting up the diffusion model training? Please let me know if you need to have other information on our training processes.
Thank you very much for your time and guidance.