Skip to content

fix(dflash): auto-detect GPU arch to prevent sm_120a on consumer Blackwell#48

Open
easel wants to merge 1 commit intoLuce-Org:mainfrom
easel:fix/consumer-blackwell-auto-detect
Open

fix(dflash): auto-detect GPU arch to prevent sm_120a on consumer Blackwell#48
easel wants to merge 1 commit intoLuce-Org:mainfrom
easel:fix/consumer-blackwell-auto-detect

Conversation

@easel
Copy link
Copy Markdown
Contributor

@easel easel commented Apr 27, 2026

Problem

On CUDA 13.2+ with consumer Blackwell GPUs (for example, RTX 5090, SM 12.0), using an unset
CMAKE_CUDA_ARCHITECTURES or native can resolve to sm_120a instead of
sm_120, which can trigger CUDA_ERROR_ILLEGAL_INSTRUCTION at runtime on
consumer hardware.

Fix (auto-detect only)

  • At configure time, if CMAKE_CUDA_ARCHITECTURES is unset or native, run
    nvidia-smi --query-gpu=compute_cap --format=csv,noheader.
  • Parse the compute capability (for example 12.0) and set
    CMAKE_CUDA_ARCHITECTURES explicitly (for example 120).
  • Keep the change isolated to dflash/CMakeLists.txt so it can be reviewed and
    merged independently from consumer-specific workaround behavior.

Test plan

  • cmake -B build -S dflash/ prints dflash27b: GPU compute_cap 12.0 → CUDA_ARCHITECTURES=120 on Blackwell hardware.
  • cmake --build build succeeds without CUDA-arch related compiler/runtime errors on a Blackwell consumer system.

@davide221
Copy link
Copy Markdown
Contributor

@easel thanks for the contribution! Is the speed problem still present ?

@easel
Copy link
Copy Markdown
Contributor Author

easel commented Apr 28, 2026

@easel thanks for the contribution! Is the speed problem still present ?

Yes. I think it's related to the workflow -- I'm putting together a small benchmark script to compare.

@easel easel force-pushed the fix/consumer-blackwell-auto-detect branch from e6dc0cd to 858b84b Compare May 4, 2026 20:24
@easel
Copy link
Copy Markdown
Contributor Author

easel commented May 4, 2026

This may not be necessary if expectation is to always build multi-arch binary. I ran into it because claude got excited about optimizing and ended up with a slightly incompatible build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants