Replies: 2 comments
-
|
Not exactly sure the cause of it. Are you willing to provide the client and server logs ? you might find something in the log. |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
@kun0906 what is your setup? did you run both server and client in the same machine? Are you using NVFlare Simulator? What version of NVFlare that you are using? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I’m currently testing how many clients I can run on a compute node with a single GPU (80 GiB total memory). Each client consumes ~5 GiB of GPU memory. When I run 8 clients, everything works. However, when I attempt to run 10 clients, they all start successfully, but after a few minutes, some of them stop unexpectedly.
Upon checking the logs for the affected clients, the last message reads:
ProcessExecutor - INFO - run (7d8dccfb-9342-4125-bf68-0742d8a33d29): child worker process finished with RC -9. No any other error messages.
While all 10 clients are running, the GPU usage remains around 50 GiB, so I'm unsure why this issue occurs.

Any insights or suggestions would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions