[2.6] Fix lightning api for existing NeMo examples. (#3518)

holgerroth · web-flow · commit 59c589a9a255 · 2025-06-18T22:43:46.000Z
Fixes # .

### Description

Fix lightning api for existing NeMo examples.

- Fixes the error in NeMo container when `load_state_dict()` returns
None type in Lightning API by wrapping the code into a try/except block.

```
2025-05-29 19:54:41,781 - SubprocessLauncher - INFO -   File "/workspace/code/nvflare/app_opt/lightning/api.py", line 201, in _receive_and_update_model
2025-05-29 19:54:41,782 - SubprocessLauncher - INFO -     missing_keys, unexpected_keys = pl_module.load_state_dict(
2025-05-29 19:54:41,782 - SubprocessLauncher - INFO - TypeError: cannot unpack non-iterable NoneType object
```

### Types of changes
&lt;!--- Put an `x` in all the boxes that apply, and remove the not
applicable items --&gt;
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.
diff --git a/integration/nemo/examples/peft/README.md b/integration/nemo/examples/peft/README.md
@@ -26,7 +26,7 @@ docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 -p 6006:6006 --
 
 Next, install NVFlare.
 ```
-pip install nvflare~=2.5.0rc
+pip install "nvflare>2.6"
 ```
 
 ## Examples
diff --git a/integration/nemo/examples/prompt_learning/README.md b/integration/nemo/examples/prompt_learning/README.md
@@ -26,7 +26,7 @@ docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 -p 6006:6006 --
 
 For easy experimentation with NeMo, install NVFlare and mount the code inside the [nemo_nvflare](./nemo_nvflare) folder.
 ```
-pip install nvflare~=2.5.0rc
+pip install "nvflare>2.6"
 pip install protobuf==3.20
 export PYTHONPATH=${PYTHONPATH}:/workspace
 ``` 
diff --git a/integration/nemo/examples/supervised_fine_tuning/README.md b/integration/nemo/examples/supervised_fine_tuning/README.md
@@ -25,7 +25,7 @@ docker run --runtime=nvidia -it --rm --shm-size=16g -p 8888:8888 -p 6006:6006 --
 
 For easy experimentation with NeMo, install NVFlare and mount the code inside the [nemo_nvflare](./nemo_nvflare) folder.
 ```
-pip install nvflare~=2.5.0rc
+pip install "nvflare>2.6"
 export PYTHONPATH=${PYTHONPATH}:/workspace
 ``` 
 
diff --git a/nvflare/app_opt/lightning/api.py b/nvflare/app_opt/lightning/api.py
@@ -198,15 +198,21 @@ def _receive_and_update_model(self, trainer, pl_module):
         model = self._receive_model(trainer)
         if model:
             if model.params:
-                missing_keys, unexpected_keys = pl_module.load_state_dict(
-                    model.params, strict=self._load_state_dict_strict
-                )
-                if len(missing_keys) > 0:
-                    self.logger.warning(f"There were missing keys when loading the global state_dict: {missing_keys}")
-                if len(unexpected_keys) > 0:
-                    self.logger.warning(
-                        f"There were unexpected keys when loading the global state_dict: {unexpected_keys}"
-                    )
+                try:
+                    result = pl_module.load_state_dict(model.params, strict=self._load_state_dict_strict)
+                    if result is not None:
+                        missing_keys, unexpected_keys = result
+                        if len(missing_keys) > 0:
+                            self.logger.warning(
+                                f"There were missing keys when loading the global state_dict: {missing_keys}"
+                            )
+                        if len(unexpected_keys) > 0:
+                            self.logger.warning(
+                                f"There were unexpected keys when loading the global state_dict: {unexpected_keys}"
+                            )
+                except Exception as e:
+                    self.logger.error(f"Failed to load state dict: {str(e)}")
+                    raise RuntimeError(f"Failed to load model state dict: {str(e)}")
             if model.current_round is not None:
                 self.current_round = model.current_round