Skip to content

Fix DeepSeek-V4 CUTLASS DSL LoadCacheMode path#2

Draft
HanHan009527 wants to merge 10000 commits into
mainfrom
codex/pr/dsv4-loadcachemode-path
Draft

Fix DeepSeek-V4 CUTLASS DSL LoadCacheMode path#2
HanHan009527 wants to merge 10000 commits into
mainfrom
codex/pr/dsv4-loadcachemode-path

Conversation

@HanHan009527

@HanHan009527 HanHan009527 commented Jun 2, 2026

Copy link
Copy Markdown
Owner

Summary

This PR fixes a narrow CUTLASS DSL API path mismatch in the DeepSeek-V4 dequant_gather_k_cutedsl kernel.

The kernel already imports cpasync:

from cutlass.cute.nvgpu import cpasync

but the load cache mode was referenced through a different namespace:

cpasync.CopyG2SOp(cute.nvgpu.LoadCacheMode.GLOBAL)

This updates it to use the same cpasync namespace as CopyG2SOp:

cpasync.CopyG2SOp(cpasync.LoadCacheMode.GLOBAL)

The change keeps the same load-cache behavior and only updates the API reference path.

CUTLASS DSL API source

The CUTLASS DSL cpasync module exports both LoadCacheMode and CopyG2SOp:

Source: https://github.com/NVIDIA/cutlass/blob/25e252bdce504932d83f43f07c4b8cc7f9b8e2b6/python/CuTeDSL/cutlass/cute/nvgpu/cpasync/__init__.py#L17-L23

__all__ = [
    "LoadCacheMode",
    "CopyG2SOp",
    ...
]

CopyG2SOp accepts the cpasync.LoadCacheMode enum as an input and normalizes it internally:

Source: https://github.com/NVIDIA/cutlass/blob/25e252bdce504932d83f43f07c4b8cc7f9b8e2b6/python/CuTeDSL/cutlass/cute/nvgpu/cpasync/copy.py#L43-L89

class LoadCacheMode(enum.Enum):
    GLOBAL = _cute_nvgpu_ir.LoadCacheMode.global_

class CopyG2SOp(CopyOp):
    def __init__(
        self, cache_mode: LoadCacheMode_ | LoadCacheMode = LoadCacheMode_.ALWAYS
    ):
        ...

Scope

This is intentionally a single-line compatibility fix. It does not attempt to solve other DeepSeek-V4/CUTLASS DSL startup issues, such as broader nvidia-cutlass-dsl API/package compatibility problems.

Testing

Not run in this PR branch. This change is limited to the symbol path used when constructing the cpasync.CopyG2SOp load atom.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.