Refactor GroupNorm and log unmatched state_dict keys #989

juliusberner · 2025-06-24T21:00:51Z

PhysicsNeMo Pull Request

Description

Refactor GroupNorm and add get_group_norm to keep the state_dict consistent with previous versions.
Log missing and unexpected keys when loading checkpoints.
Add persistent=False for deterministic, non-learnable positional embeddings.

Closes #1001 .

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.

physicsnemo/models/util_compatibility.py

physicsnemo/models/module.py

pzharrington · 2025-07-03T21:30:19Z

physicsnemo/models/module.py

@@ -461,8 +483,7 @@ def from_checkpoint(
                local_path.joinpath("model.pt"), map_location=model.device
            )

-            model_dict = convert_ckp_apex(ckp_args, model_args, model_dict)
-            model.load_state_dict(model_dict, strict=False)
+            load_state_dict_with_logging(model, model_dict, strict=False)


I'm generally uncomfortable with strict=False here. I realize it was injected in #809 rather than this PR, but since this is a widely used function affecting all trained models in physicsnemo, can we revert it to strict=True here? Seems like backwards-compat handling should be taken care of by the time this line is run. Thoughts @CharlelieLrt ?

@pzharrington good point! I was actually wondering why we were setting strict=False here, but I thought it was there since the beginning and I've never realized it was introduced by #809 . We should definitely revert it back to True. AFAIK it's only useful when fine-tuning some parts of the model, or other things like this...

@juliusberner does your application requires strict=False there, or would it be okay to revert to True?

Probably also worth checking with @LostnEkko -- was it introduced in #809 as part of the apex checkpoint handling, and would you be able to test your use-case off of this PR to see if strict=True works?

I think the best option would be to expose strict (similar as in the load method above). We can then default it to True if we want --- if the user sets it to False, we would still log unexpected/missing keys with this PR (preventing silent errors which are happening right now since the checkpoint conversion has a bug in https://github.com/NVIDIA/physicsnemo/blob/d1c9391f0f594f7279c8990bff70b8227a6d1f93/physicsnemo/models/util_compatibility.py#L92C17-L92C50)

Probably also worth checking with @LostnEkko

@jialusui1102 for viz

@juliusberner sounds good to me! Feel free to update your PR accordingly

@CharlelieLrt I adapted it

akshaysubr · 2025-07-03T23:40:57Z

physicsnemo/models/diffusion/layers.py

+    might be adjusted to satisfy the `min_channels_per_group` condition.
+    """
+
+    num_groups = min(num_groups, num_channels // min_channels_per_group)


Should include the fix from #996 here as well

Thanks, I rebased and included it now!

Signed-off-by: Julius Berner <[email protected]>

CharlelieLrt self-requested a review June 24, 2025 21:05

CharlelieLrt assigned juliusberner Jun 24, 2025

CharlelieLrt added bug Something isn't working 3 - Ready for Review Ready for review by team labels Jun 24, 2025

CharlelieLrt reviewed Jun 30, 2025

View reviewed changes

physicsnemo/models/util_compatibility.py Outdated Show resolved Hide resolved

physicsnemo/models/module.py Show resolved Hide resolved

CharlelieLrt mentioned this pull request Jul 3, 2025

🐛[BUG]: GroupNorm creates unused parameters when use_apex_gn=True #1001

Open

pzharrington reviewed Jul 3, 2025

View reviewed changes

akshaysubr reviewed Jul 3, 2025

View reviewed changes

Julius Berner added 2 commits July 9, 2025 01:13

Refactor GroupNorm and log unmatched state_dict keys

59a6ff1

Signed-off-by: Julius Berner <[email protected]>

Add changes from MR996

f8e01c7

juliusberner force-pushed the improve_diffusion branch from 8767605 to f8e01c7 Compare July 9, 2025 01:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor GroupNorm and log unmatched state_dict keys #989

Refactor GroupNorm and log unmatched state_dict keys #989

juliusberner commented Jun 24, 2025 •

edited by CharlelieLrt

Loading

Uh oh!

Uh oh!

Uh oh!

pzharrington Jul 3, 2025

Uh oh!

CharlelieLrt Jul 3, 2025

Uh oh!

pzharrington Jul 3, 2025

Uh oh!

juliusberner Jul 3, 2025 •

edited

Loading

Uh oh!

CharlelieLrt Jul 3, 2025 •

edited

Loading

Uh oh!

CharlelieLrt Jul 3, 2025

Uh oh!

juliusberner Jul 9, 2025

Uh oh!

akshaysubr Jul 3, 2025

Uh oh!

juliusberner Jul 9, 2025

Uh oh!

Uh oh!

Refactor GroupNorm and log unmatched state_dict keys #989

Are you sure you want to change the base?

Refactor GroupNorm and log unmatched state_dict keys #989

Conversation

juliusberner commented Jun 24, 2025 • edited by CharlelieLrt Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Checklist

Uh oh!

Uh oh!

Uh oh!

pzharrington Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

CharlelieLrt Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

pzharrington Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

juliusberner Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CharlelieLrt Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CharlelieLrt Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

juliusberner Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

akshaysubr Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

juliusberner Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

juliusberner commented Jun 24, 2025 •

edited by CharlelieLrt

Loading

juliusberner Jul 3, 2025 •

edited

Loading

CharlelieLrt Jul 3, 2025 •

edited

Loading