Skip to content

Conversation

Qubitium
Copy link

@Qubitium Qubitium commented Sep 26, 2025

What does this PR do?

Fix disk_offload() api causing torch.cuda.empty_cache() to be called when the module origin device is cpu and offloaded to meta (disk).

Secondarily this also resolves a performance issue as torch.cuda.empty_cache() is slow and calling it with no effect in a forwardig env where modules are dynamically (manually) offloaded is suboptimal.

# nn.Module (Linear)
model: TritonV2QuantLinear  (P=0 B=2.25M) [cpu | mixed[int32, float16] | ~8.32MB]
      buffer: g_idx  shape=(2048,) dtype=int32 device=cpu ~8.00KB
      buffer: scales  shape=(16, 8192) dtype=float16 device=cpu ~256.00KB
      buffer: qweight  shape=(256, 8192) dtype=int32 device=cpu ~8.00MB
      buffer: qzeros  shape=(16, 1024) dtype=int32 device=cpu ~64.00KB

Given above nn.Module (linear) which is on cpu and following this call, I did not expect accelerate code paths to call anything cuda related. This (in-directly) triggered a cuda assert error in my GIL=0 env with multiple gpu and threads. I probably have thread ctx bug somehwere above the code but the main objective is that in this scenerio, torch.cuda.empty_cache() should never be called by disk_offload paths.

    _ = disk_offload(
        module, # <--- see above ascii print of module
        offload_dir=f"{disk_path}/{name}",
        offload_buffers=True,  # needed for buffers
        execution_device=torch.device("cpu"),
    )

Stacktrace: Please note the crash is not caused by accelerate. The stack shows the paths that triggered the invalid torch.cuda.empty_cache() call.

Traceback (most recent call last):
 File "/root/GPTQModel/gptqmodel/utils/threads.py", line 33, in _runner
   return fn()
 File "/root/GPTQModel/gptqmodel/looper/module_looper.py", line 553, in finalize_module
   offload_to_disk(
   ~~~~~~~~~~~~~~~^
       model=self.gptq_model.model,
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       module=self.gptq_model.model.get_submodule(module.full_name),
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       disk_path=self.gptq_model.quantize_config.offload_to_disk_path,
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   )
   ^
 File "/root/GPTQModel/gptqmodel/utils/offload.py", line 80, in offload_to_disk
   _offload_disk(module=module, name=full_name, disk_path=disk_path)
   ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/root/GPTQModel/gptqmodel/utils/offload.py", line 108, in _offload_disk
   _ = disk_offload(
       module,
   ...<3 lines>...
       execution_device=m_device,
   )
 File "/root/accelerate/src/accelerate/big_modeling.py", line 297, in disk_offload
   attach_align_device_hook(
   ~~~~~~~~~~~~~~~~~~~~~~~~^
       model,
       ^^^^^^
   ...<4 lines>...
       preload_module_classes=preload_module_classes,
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   )
   ^
 File "/root/accelerate/src/accelerate/hooks.py", line 521, in attach_align_device_hook
   add_hook_to_module(module, hook, append=True)
   ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
 File "/root/accelerate/src/accelerate/hooks.py", line 166, in add_hook_to_module
   module = hook.init_hook(module)
 File "/root/accelerate/src/accelerate/hooks.py", line 111, in init_hook
   module = hook.init_hook(module)
 File "/root/accelerate/src/accelerate/hooks.py", line 313, in init_hook
   set_module_tensor_to_device(module, name, "meta")
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
 File "/root/accelerate/src/accelerate/utils/modeling.py", line 408, in set_module_tensor_to_device
   clear_device_cache()
   ~~~~~~~~~~~~~~~~~~^^
 File "/root/accelerate/src/accelerate/utils/memory.py", line 65, in clear_device_cache
   torch.cuda.empty_cache()
   ~~~~~~~~~~~~~~~~~~~~~~^^
 File "/root/vm313t/lib/python3.13t/site-packages/torch/cuda/memory.py", line 224, in empty_cache
   torch._C._cuda_emptyCache()
   ~~~~~~~~~~~~~~~~~~~~~~~~~^^
torch.AcceleratorError: CUDA error: an illegal memory access was encountered

Who can review?

@SunMarc @zach-huggingface @BenjaminBossan

Copy link
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot and nice report btw !

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants