Skip to content

[RFE]: reduce unnecessary cuda memory alloc #1907

@visualxu

Description

@visualxu

I noticed that the code at https://github.com/NVIDIA/nccl/blob/master/src/init.cc#L1308 is always executed, but shared_net_buffer is only used when comm->nNodes > 1. Is this a waste?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions