Describe the bug
/TransformerEngine/transformer_engine/common/activation/./../cast/dispatch/../fp8/group_quantize_fp8.cuh:1075: error: template argument 2 is invalid
/TransformerEngine/transformer_engine/common/activation/./../cast/dispatch/../fp8/group_quantize_fp8.cuh:1075: error: ‘IVecT’ has not been declared
/TransformerEngine/transformer_engine/common/activation/./../cast/dispatch/../fp8/group_quantize_fp8.cuh:1075: error: ‘OVecT’ has not been declared
Steps/Code to reproduce bug
NCCL_HOME=/usr/local/miniconda3/lib/python3.12/site-packages/nvidia/nccl NVTE_BUILD_WITH_NCCL_EP=1 USE_NCCL=1 NVTE_CUDA_ARCHS="100a" NVTE_BUILD_THREADS_PER_JOB=8 NVTE_FRAMEWORK=pytorch VERBOSE=1 pip install -vvv --no-build-isolation --no-cache-dir -e .
Environment details
- B200
- PyTorch 2.11
- Python 3.12
- Transformer Engine version 4cd244e
- CUDA 13.0
- CUDNN 9.19
Describe the bug
/TransformerEngine/transformer_engine/common/activation/./../cast/dispatch/../fp8/group_quantize_fp8.cuh:1075: error: template argument 2 is invalid
/TransformerEngine/transformer_engine/common/activation/./../cast/dispatch/../fp8/group_quantize_fp8.cuh:1075: error: ‘IVecT’ has not been declared
/TransformerEngine/transformer_engine/common/activation/./../cast/dispatch/../fp8/group_quantize_fp8.cuh:1075: error: ‘OVecT’ has not been declared
Steps/Code to reproduce bug
NCCL_HOME=/usr/local/miniconda3/lib/python3.12/site-packages/nvidia/nccl NVTE_BUILD_WITH_NCCL_EP=1 USE_NCCL=1 NVTE_CUDA_ARCHS="100a" NVTE_BUILD_THREADS_PER_JOB=8 NVTE_FRAMEWORK=pytorch VERBOSE=1 pip install -vvv --no-build-isolation --no-cache-dir -e .
Environment details