-
Notifications
You must be signed in to change notification settings - Fork 26k
Open
NVIDIA/nccl
#1864Labels
actionablehigh prioritymodule: buildBuild system issuesBuild system issuesmodule: ciRelated to continuous integrationRelated to continuous integrationmodule: ncclProblems related to nccl supportProblems related to nccl supportmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'tmodule: third_partytriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🐛 Describe the bug
If one to build PyTorch locally with NCCL-2.28.3 following simple program will fail with cudaErrorNoKernelImageForDevice while running following simple script
import os
import torch
os.environ["MASTER_PORT"]="12345"
os.environ["MASTER_ADDR"]="localhost"
os.environ["RANK"]="0"
print(torch.__version__, torch.cuda.nccl.version())
torch.distributed.init_process_group(backend='nccl', world_size=1)
torch.distributed.barrier()
model = torch.nn.Linear(128, 128).cuda()
torch.cuda.synchronize()
x = torch.randn((32, 128), device="cuda")And errors look as follows
2.10.0a0+gitb103378 (2, 28, 3)
/home/dev/git/pytorch/pytorch/torch/distributed/distributed_c10d.py:4879: UserWarning: barrier(): using the device under current context. You can specify `device_id` in `init_process_group` to mute this warning.
warnings.warn( # warn only once
[rank0]:[W1001 20:22:34.218303536 ProcessGroupNCCL.cpp:5092] Guessing device ID based on global rank. This can cause a hang if rank to GPU mapping is heterogeneous. You can specify device_id in init_process_group()
[rank0]: Traceback (most recent call last):
[rank0]: File "/home/dev/foo.py", line 13, in <module>
[rank0]: x = torch.randn((32, 128), device="cuda")
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: torch.AcceleratorError: CUDA error: no kernel image is available for execution on the device
[rank0]: Search for `cudaErrorNoKernelImageForDevice' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1
[rank0]: Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Versions
nightly
cc @ezyang @gchanan @zou3519 @kadeng @msaroufim @seemethere @pytorch/pytorch-dev-infra
Metadata
Metadata
Assignees
Labels
actionablehigh prioritymodule: buildBuild system issuesBuild system issuesmodule: ciRelated to continuous integrationRelated to continuous integrationmodule: ncclProblems related to nccl supportProblems related to nccl supportmodule: regressionIt used to work, and now it doesn'tIt used to work, and now it doesn'tmodule: third_partytriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Type
Projects
Status
Prioritized
Status
Todo