Skip to content
Discussion options

You must be logged in to vote

Hi @oded-byte, we found the root cause, it is these two timeouts:

peer_read_timeout (Line36) and heartbeat_timeout (Line40),
https://github.com/NVIDIA/NVFlare/blob/main/nvflare/app_opt/pt/client_api_launcher_executor.py#L36-L40

setting both to 300 and the issue's gone on my side, you can also test and adjust on your machine as the speed of each machine’s different, as you noticed, faster machine can work with defaults.

Thanks for noticing and raising this! We will update our APIs accordingly and figure out a good way to have these timeouts set properly.

Replies: 5 comments 4 replies

Comment options

You must be logged in to vote
3 replies
@oded-byte
Comment options

@chesterxgchen
Comment options

@chesterxgchen
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@ZiyueXu77
Comment options

Comment options

You must be logged in to vote
0 replies
Answer selected by ZiyueXu77
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants