Skip to content

fix(hip): dflash compatibility aliases + TQ3_0 FA guard#8

Merged
davide221 merged 2 commits intoLuce-Org:luce-dflashfrom
smpurkis:strix-halo-rocm-dflash-latest
May 8, 2026
Merged

fix(hip): dflash compatibility aliases + TQ3_0 FA guard#8
davide221 merged 2 commits intoLuce-Org:luce-dflashfrom
smpurkis:strix-halo-rocm-dflash-latest

Conversation

@smpurkis
Copy link
Copy Markdown

@smpurkis smpurkis commented May 7, 2026

Note: This PR was vibe-coded with AI assistance (Claude Code). The fixes are mechanical (alias additions + HIP-gated guards) — please review accordingly.

Summary

  • Three small HIP/ROCm compatibility fixes needed to build the luce-dflash branch under -DGGML_HIP=ON for downstream dflash builds (Strix Halo / gfx1151, ROCm 7.2 / clang 21).
  • No behavior change for CUDA builds — all changes are HIP-gated or alias-only.

Changes

  1. ggml/src/ggml-cuda/vendors/hip.h — add four missing CUDA→HIP symbol aliases that newer dflash code paths depend on:
    • cublasSgemmStridedBatchedhipblasSgemmStridedBatched
    • cudaStreamCaptureStatushipStreamCaptureStatus
    • cudaStreamCaptureStatusNonehipStreamCaptureStatusNone
    • cudaStreamIsCapturinghipStreamIsCapturing
  2. ggml/src/ggml-cuda/gated_delta_net.cu — guard the <cuda_fp16.h> include under #ifndef GGML_USE_HIP (HIP path picks up the type via hip_fp16.h through the runtime header).
  3. ggml/src/ggml-cuda/fattn.cu — wrap the TQ3_0 FATTN_VEC_CASES_ALL_D template instantiations under #ifndef GGML_USE_HIP. The TQ3_0 vec_dot kernel isn't ported to HIP yet, so instantiating these on HIP fails to link.

Why

Without these, a HIP build of luce-dflash fails at ggml-hip with errors like:

  • 'cuda_fp16.h' file not found (gated_delta_net.cu)
  • use of undeclared identifier 'cublasSgemmStridedBatched' / 'cudaStreamCaptureStatus' (fattn-chunked.cu)
  • TQ3_0 vec-FA template instantiation failure (fattn.cu)

Test plan

  • Builds cleanly on Strix Halo (gfx1151) with ROCm 7.2 + -DGGML_HIP=ON -DCMAKE_HIP_ARCHITECTURES=gfx1151
  • Used end-to-end as the submodule for the lucebox-hub HIP/ROCm support PR — Add HIP/ROCm support for Strix Halo (gfx1151) lucebox-hub#119
  • Existing CUDA paths untouched (no instantiations removed; only HIP-gated)

@davide221 davide221 merged commit c79573c into Luce-Org:luce-dflash May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants