Skip to content

Conversation

@baymaxhuang
Copy link

background
In nccl, when we use NCCL_SOCKET_IFNAME to specify which IP interfaces to use for communication, only the listening socket (server side) will bind a specific ifname address. The client socket would use the default address. It would cause some unexpected problems when there are link problems.

Solution
Add a peerAdd in ncclSocket to save peer address of the socket, and the local address is saved in addr. When a client connects to a remote socket (sock->peerAdd), the source address would also be bound to sock->addr.

TODO
Init rasnet client socket with a specific address that rasNetListeningSocket also uses. At present, ncclSocketInit uses nullptr and it would be bound to the default address of the system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant