You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our current parallelism setup for one of our workload, we are using a parallelism set of TP=4,CP=4.
That is - for one DP group - two cp-ranks are on one node and two cp-ranks are on other node.
On account of this setup, we would like to try to use the cp_comm_type="a2a+p2p" for CP communication which uses the NVLink for intra node and IBLink for inter-node as this gets better throughput.
To enable, TEngine requires that we provide hierarchical CP groups here (link). Hierarchical CP groups can be created by passing the appropriate group size in the parallel state (here).
The groups are created as per this logic here, but I don't quite understand it how these groups are generated and what do level1, level2 subgroups mean and how sequence is split between these sub groups ?
So my question is -
1./ For the above parallelism setup, two CP-groups on one node and two CP-ranks on another node, what should be the hierarchical group sizes that we need to give the parallel_state initialization to get to use the cp_comm_type="a2a+p2p" communication.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Our current parallelism setup for one of our workload, we are using a parallelism set of TP=4,CP=4.
That is - for one DP group - two cp-ranks are on one node and two cp-ranks are on other node.
On account of this setup, we would like to try to use the cp_comm_type="a2a+p2p" for CP communication which uses the NVLink for intra node and IBLink for inter-node as this gets better throughput.
To enable, TEngine requires that we provide hierarchical CP groups here (link). Hierarchical CP groups can be created by passing the appropriate group size in the parallel state (here).
The groups are created as per this logic here, but I don't quite understand it how these groups are generated and what do level1, level2 subgroups mean and how sequence is split between these sub groups ?
So my question is -
1./ For the above parallelism setup, two CP-groups on one node and two CP-ranks on another node, what should be the hierarchical group sizes that we need to give the parallel_state initialization to get to use the cp_comm_type="a2a+p2p" communication.
Beta Was this translation helpful? Give feedback.
All reactions