Skip to content

[Question]: why PXN graph is prefered than crossNic ? #1899

@fishautumn

Description

@fishautumn

Question

when NIC bandwidth is low, the channel search algorithm will end with PXN channels. so ring alteration wron't execute.

I prefer ring alteration for rail optimization. Also compared with PXN, crossNic channels has less hops.

after some investigation, I found ncclTopoCompareGraphs() doesn't check hop count when graphs have different crossNic value:

if (graph->pattern == refGraph->pattern && graph->crossNic == refGraph->crossNic && graph->nHops < refGraph->nHops) *copy = 1;

as a result, it prefers PXN graph than crossNic graph here.

so the questions are:

  1. why PXN is preferred than crossNic ?
  2. should we prefer less hops here, especially for ring pattern ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions