Skip to content

Commit b818de3

Browse files
authored
Update mgpu GitLab commit SHA to incorporate fixes for NVL support (#3606)
Signed-off-by: Thien Nguyen <[email protected]>
1 parent 5dcffc7 commit b818de3

File tree

2 files changed

+4
-4
lines changed

2 files changed

+4
-4
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
nvidia-mgpu-repo: cuda-quantum/cuquantum-mgpu.git
2-
nvidia-mgpu-commit: 8d7646431c824f8a7bf88bf3d9ba02f42746a024
2+
nvidia-mgpu-commit: 438397cdc7529293c78a399243c63dc3f6c3886a

docs/sphinx/using/backends/sims/svsims.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -254,11 +254,11 @@ the multi-node multi-GPU configuration. Any environment variables must be set pr
254254
- positive integer
255255
- Specify the number of GPUs that can communicate by using GPUDirect P2P. Default value is 0 (P2P communication is disabled).
256256
* - ``CUDAQ_GPU_FABRIC``
257-
- `MNNVL`, `NVL`, or `NONE`
258-
- Automatically set the number of P2P device bits based on the total number of processes when multi-node NVLink (`MNNVL`) is selected; or the number of processes per node when NVLink (`NVL`) is selected; or disable P2P (with `NONE`).
257+
- `MNNVL`, `NVL`, `NONE`, or NVLink domain size (power of 2 integer)
258+
- Automatically set the number of P2P device bits based on the total number of processes when multi-node NVLink (`MNNVL`) is selected; or the number of processes per node when NVLink (`NVL`) is selected; or disable P2P (with `NONE`); or a specific NVLink domain size.
259259
* - ``CUDAQ_GLOBAL_INDEX_BITS``
260260
- comma-separated list of positive integers
261-
- Specify the network structure (faster to slower). For example, assuming a 32 MPI processes simulation, whereby the network topology is divided into 4 groups of 8 processes, which have faster communication network between them. In this case, the `CUDAQ_GLOBAL_INDEX_BITS` environment variable can be set to `3,2`. The first `3` (`log2(8)`) represents **8** processes with fast communication within the group and the second `2` represents the **4** groups (8 processes each) in those total 32 processes. The sum of all elements in this list is `5`, corresponding to the total number of MPI processes (`2^5 = 32`). Default is an empty list (no customization based on network structure of the cluster).
261+
- Specify the network structure (faster to slower). For example, assuming a 32 MPI processes simulation, whereby the network topology is divided into 4 groups of 8 processes, which have faster communication network between them. In this case, the `CUDAQ_GLOBAL_INDEX_BITS` environment variable can be set to `3,2`. The first `3` (`log2(8)`) represents **8** processes with fast communication within the group and the second `2` represents the **4** groups (8 processes each) in those total 32 processes. The sum of all elements in this list is `5`, corresponding to the total number of MPI processes (`2^5 = 32`). If none specified, the global index bits are set based on P2P device bits.
262262
* - ``CUDAQ_HOST_DEVICE_MIGRATION_LEVEL``
263263
- positive integer
264264
- Specify host-device memory migration w.r.t. the network structure. If provided, this setting determines the position to insert the number of migration index bits to the `CUDAQ_GLOBAL_INDEX_BITS` list. By default, if not set, the number of migration index bits (CPU-GPU data transfers) is appended to the end of the array of index bits (aka, state vector distribution scheme). This default behavior is optimized for systems with fast GPU-GPU interconnects (NVLink, InfiniBand, etc.)

0 commit comments

Comments
 (0)