[Feature Preview] Introduce one-sided API with ncclPut #1904

marksantesson · 2025-11-08T04:00:36Z

This is an Early Access release. It contains new features and APIs that may be changed before the official General Access release. In particular, the host one-sided API is not frozen. It also may contain performance degradation that is not representative of the General Access release.

Host One-Sided RMA API

Introduces one-sided ncclPut, ncclSignal and ncclWaitSignal APIs. Please refer to src/nccl.h.in for API definitions.
The one-sided API support both intra-node (NVL) and inter-node (network) operations
The implementation of the one-sided APIs does not use SM (zero-SM)
Put/WaitSignal over network can achieve ~25% latency reduction compared to send/recv.
The API signature, functionalities and final performance are subject to change for the official release.

Support Requirements

The one-sided API requires NCCL window registration of the user buffers
To enable one-sided API over the network, please set
- NCCL_GIN_TYPE=2
- NCCL_NET=IB
Current one-sided APIs do not work with CUDA graph capture

Example

The example below demonstrate a ping-pong communication pattern using the ncclPut and ncclWaitSignal between two ranks.

// Configure NCCL with one-sided RMA support
ncclConfig_t config = NCCL_CONFIG_INITIALIZER;
config.numRmaCtx = 1;  // Enable RMA with 1 context
config.blocking = 1;
NCCLCHECK(ncclCommInitRankConfig(&comm, nRanks, id, myRank, &config));

int ctx = 0;  // Use context 0 for RMA operations

// Allocate symmetric memory for RMA operations
void *sendbuff, *recvbuff;
NCCLCHECK(ncclMemAlloc((void**)&sendbuff, size));
NCCLCHECK(ncclMemAlloc((void**)&recvbuff, size));

// Register both send and receive buffers as symmetric windows for RMA operations
ncclWindow_t sendWindow, recvWindow;
NCCLCHECK(ncclCommWindowRegister(comm, sendbuff, args.end_size, &sendWindow, NCCL_WIN_COLL_SYMMETRIC));
NCCLCHECK(ncclCommWindowRegister(comm, recvbuff, args.end_size, &recvWindow, NCCL_WIN_COLL_SYMMETRIC));

// Ensure all ranks have completed window registration before proceeding
MPICHECK(MPI_Barrier(MPI_COMM_WORLD));

if (myRank == 0) {  // Rank 0: wait then put
    // Wait for signal from peer
    int nsignals = 1;
    int peer = 1;
    NCCLCHECK(ncclWaitSignal(ctx, &peer, &nsignals, 1, NCCL_SIGNAL_DISTINCT, comm, stream));

    // Put data with signal to peer's receive buffer
    NCCLCHECK(ncclPut(ctx, sendbuff, nelems, ncclInt, peer, 0, recvWindow,
                        NCCL_SIGNAL_DISTINCT, comm, stream));

} else {   // Rank 1: put then wait
    // Put data with signal to peer's receive buffer
    int peer = 0;
    NCCLCHECK(ncclPut(ctx, sendbuff, nelems, ncclInt, peer, 0, recvWindow,
                        NCCL_SIGNAL_DISTINCT, comm, stream));

    // Wait for signal from peer
    int nsignals = 1;
    NCCLCHECK(ncclWaitSignal(ctx, &peer, &nsignals, 1, NCCL_SIGNAL_DISTINCT, comm, stream));

}
NCCLCHECK(ncclCommWindowDeregister(comm, sendWindow));
NCCLCHECK(ncclCommWindowDeregister(comm, recvWindow));
NCCLCHECK(ncclMemFree(sendbuff));
NCCLCHECK(ncclMemFree(recvbuff));
CUDACHECK(cudaStreamDestroy(stream));
NCCLCHECK(ncclCommFinalize(comm));
NCCLCHECK(ncclCommDestroy(comm));

NCCL Put API Preview

430081a

xiaofanl-nvidia changed the title ~~NCCL Put API Preview~~ [Feature Preview] Introduce one-sided API with ncclPut Nov 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Preview] Introduce one-sided API with ncclPut #1904

[Feature Preview] Introduce one-sided API with ncclPut #1904

Uh oh!

marksantesson commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Feature Preview] Introduce one-sided API with ncclPut #1904

Are you sure you want to change the base?

[Feature Preview] Introduce one-sided API with ncclPut #1904

Uh oh!

Conversation

marksantesson commented Nov 8, 2025

Host One-Sided RMA API

Support Requirements

Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants