Skip to content

CUDA error: device-side assert triggered #221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Wangwang99999 opened this issue May 19, 2025 · 5 comments
Open

CUDA error: device-side assert triggered #221

Wangwang99999 opened this issue May 19, 2025 · 5 comments

Comments

@Wangwang99999
Copy link

File”…/1ib/pvthon3,11/site-packages/torch/functional.py”, line 1335, in cdist
return _VF.cdist(x1, x2, p, None) #type: ignore(attr-defined)
Runtimeprror: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA LAUNCH BLOCKING=1.
Compile with ‘TORCH_USE_CUDA_DSA’ to enable device-side assertions.

@SkalskiP
Copy link
Collaborator

@Wangwang99999 could you share the annotations file you're using? Just the JSON.

@Wangwang99999
Copy link
Author

_annotations.coco.json
I modified line 47 in detr.py as:
class_names = [c["name"] for c in anns["categories"]]

@ThierryDeruyttere
Copy link

Can you check that your categories are 0-indexed? I did 1-indexing and that was messing things up for me. I switched to 0-indexing and that worked. Btw to debug these type of things, run your code on CPU as the errors will be a lot clearer.

@JackGUO-boy
Copy link

_annotations.coco.json I modified line 47 in detr.py as: class_names = [c["name"] for c in anns["categories"]]

The ID for "categories" in my JSON starts from 0, and I have also modified the line of code in dert.py where class_name=[c ["name"] for c in anns ["categories"]]. However, when I run train.by, I still get an error: return _VF.cdist (x1, x2, p, None) # type: ignore [attr-defined]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

How did you solve it?

@ThierryDeruyttere
Copy link

Please run it on CPU you will have a better idea of what's going on. And my comment was actually not 100% correct in the end. I had to add a dummy class at index 0 and then my other classes from 1 onward. Only then everything worked also the inference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants