Skip to content

Commit 39cd951

Browse files
authored
Update mgpu sha and gate fusion doc (#3344)
* Update mgpu sha and gate fusion doc Signed-off-by: Thien Nguyen <[email protected]> * Spell check fixes Signed-off-by: Thien Nguyen <[email protected]> --------- Signed-off-by: Thien Nguyen <[email protected]>
1 parent 0c57a46 commit 39cd951

File tree

2 files changed

+32
-3
lines changed

2 files changed

+32
-3
lines changed
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
nvidia-mgpu-repo: cuda-quantum/cuquantum-mgpu.git
2-
nvidia-mgpu-commit: bfccb143f12b42be129ed2fbf16c39428eaba7b7
2+
nvidia-mgpu-commit: 5f1033f9efbe952633e567d676a64a237cb43ba7

docs/sphinx/using/backends/sims/svsims.rst

Lines changed: 31 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ It is worth drawing attention to gate fusion, a powerful tool for improving simu
107107
- Description
108108
* - ``CUDAQ_FUSION_MAX_QUBITS``
109109
- positive integer
110-
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC) and the floating point precision selected for the simulator. Specifically, for CC 8.0, 9.0, and 10.0 the defaults are `4`, `5`, and `5` for `FP32`. For `FP64` the corresponding defaults are `5`, `6`, and `4`. For all other CC, the default is `4` for both precision modes.
110+
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC) and the floating point precision selected for the simulator as specified :ref:`here <gate-fusion-table>`.
111111
* - ``CUDAQ_FUSION_DIAGONAL_GATE_MAX_QUBITS``
112112
- integer greater than or equal to -1
113113
- The max number of qubits used for diagonal gate fusion. The default value is set to `-1` and the fusion size will be automatically adjusted for the better performance. If 0, the gate fusion for diagonal gates is disabled.
@@ -249,7 +249,7 @@ the multi-node multi-GPU configuration. Any environment variables must be set pr
249249
- The qubit count threshold where state vector distribution is activated. Below this threshold, simulation is performed as independent (non-distributed) tasks across all MPI processes for optimal performance. Default is 25.
250250
* - ``CUDAQ_MGPU_FUSE``
251251
- positive integer
252-
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC) and the floating point precision selected for the simulator. Specifically, for CC 8.0, 9.0, and 10.0 the defaults are `4`, `5`, and `5` for `FP32`. For `FP64` the corresponding defaults are `5`, `6`, and `4`. For all other CC, the default is `4` for both precision modes.
252+
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC) and the floating point precision selected for the simulator as specified :ref:`here <gate-fusion-table>`.
253253
* - ``CUDAQ_MGPU_P2P_DEVICE_BITS``
254254
- positive integer
255255
- Specify the number of GPUs that can communicate by using GPUDirect P2P. Default value is 0 (P2P communication is disabled).
@@ -270,6 +270,35 @@ the multi-node multi-GPU configuration. Any environment variables must be set pr
270270
The :code:`nvidia-mgpu` backend, which is equivalent to the multi-node multi-GPU double-precision option (`mgpu,fp64`) of the :code:`nvidia`
271271
is deprecated and will be removed in a future release.
272272

273+
.. |:spellcheck-disable:| replace:: \
274+
.. |:spellcheck-enable:| replace:: \
275+
276+
277+
.. _gate-fusion-table:
278+
279+
.. list-table:: **Default Gate Fusion Size**
280+
:widths: 20 30 50
281+
282+
* - Compute Capability
283+
- GPU
284+
- Default Gate Fusion Size
285+
* - 8.0
286+
- NVIDIA A100
287+
- 4 (`fp32`) or 5 (`fp64`)
288+
* - 9.0
289+
- NVIDIA H100, H200, |:spellcheck-disable:| GH200 |:spellcheck-enable:|
290+
- 5 (`fp32`) or 6 (`fp64`)
291+
* - 10.0
292+
- NVIDIA GB200, B200
293+
- 5 (`fp32`) or 4 (`fp64`)
294+
* - 10.3
295+
- NVIDIA B300
296+
- 5 (`fp32`) or 1 (`fp64`)
297+
* - Others
298+
-
299+
- 4 (`fp32` and `fp64`)
300+
301+
273302
The above configuration options of the :code:`nvidia` backend
274303
can be tuned to reduce your simulation runtimes. One of the
275304
performance improvements is to fuse multiple gates together during runtime. For

0 commit comments

Comments
 (0)