support binding client socket address to specific ifname #1782
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
background
In nccl, when we use NCCL_SOCKET_IFNAME to specify which IP interfaces to use for communication, only the listening socket (server side) will bind a specific ifname address. The client socket would use the default address. It would cause some unexpected problems when there are link problems.
Solution
Add a peerAdd in ncclSocket to save peer address of the socket, and the local address is saved in addr. When a client connects to a remote socket (sock->peerAdd), the source address would also be bound to sock->addr.
TODO
Init rasnet client socket with a specific address that rasNetListeningSocket also uses. At present, ncclSocketInit uses nullptr and it would be bound to the default address of the system.