Skip to content

Fix CUDA 13 build error in gqa_unfused_attention.cu#28309

Merged
yuslepukhin merged 1 commit intomainfrom
tlwu/20260501/fix_cuda13_build
May 1, 2026
Merged

Fix CUDA 13 build error in gqa_unfused_attention.cu#28309
yuslepukhin merged 1 commit intomainfrom
tlwu/20260501/fix_cuda13_build

Conversation

@tianleiwu
Copy link
Copy Markdown
Contributor

Description

Fix CUDA 13 build failure introduced by PR #28198 (commit 997c479).

Root cause: gqa_unfused_attention.cu directly includes <cub/cub.cuh>, which on CUDA 13.x transitively pulls in CCCL's tcgen05_ld.h. That header uses __out as a parameter name in inline PTX assembly, but on Windows MSVC the SAL annotation macro #define __out expands it, turning __out[0] into [0] — causing a parse error.

Fix: Changed #include <cub/cub.cuh> to #include "core/providers/cuda/cu_inc/cub.cuh" — the existing ORT wrapper that #undef __out before including CUB.

Motivation and Context

The CUDA 13 packaging pipeline (py-cuda13-packaging-pipeline.yml) has been failing since PR #28198 was merged, with errors like:

E:/_work/_temp/v13.0/include/cccl/cuda/__ptx/instructions/generated/tcgen05_ld.h(20): error : expected an identifier
      asm("tcgen05.ld.sync.aligned.16x64b.x1.b32 {%0}, [%1];" : "=r"( [0]) : "r"(__taddr) : "memory");

Copy link
Copy Markdown
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yuslepukhin yuslepukhin merged commit 6e19374 into main May 1, 2026
89 checks passed
@yuslepukhin yuslepukhin deleted the tlwu/20260501/fix_cuda13_build branch May 1, 2026 16:10
tianleiwu added a commit that referenced this pull request May 3, 2026
### Description
Fix CUDA 13 build failure introduced by PR #28198 (commit 997c479).

**Root cause:** `gqa_unfused_attention.cu` directly includes
`<cub/cub.cuh>`, which on CUDA 13.x transitively pulls in CCCL's
`tcgen05_ld.h`. That header uses `__out` as a parameter name in inline
PTX assembly, but on Windows MSVC the SAL annotation macro `#define
__out` expands it, turning `__out[0]` into `[0]` — causing a parse
error.

**Fix:** Changed `#include <cub/cub.cuh>` to `#include
"core/providers/cuda/cu_inc/cub.cuh"` — the existing ORT wrapper that
`#undef __out` before including CUB.

### Motivation and Context
The CUDA 13 packaging pipeline (`py-cuda13-packaging-pipeline.yml`) has
been failing since PR #28198 was merged, with errors like:
```
E:/_work/_temp/v13.0/include/cccl/cuda/__ptx/instructions/generated/tcgen05_ld.h(20): error : expected an identifier
      asm("tcgen05.ld.sync.aligned.16x64b.x1.b32 {%0}, [%1];" : "=r"( [0]) : "r"(__taddr) : "memory");
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants