Skip to content

Conversation

@divakar-amd
Copy link

This fix resolves triton.runtime.errors.OutOfResources error on AMD GPUs (mi300).

Here's the error log without this fix:

  File "/Projects/VLLM_DIR/vllm/vllm/v1/worker/gpu_model_runner.py", line 2934, in sample_tokens
    apply_grammar_bitmask(
  File "/Projects/VLLM_DIR/vllm/vllm/v1/structured_output/utils.py", line 126, in apply_grammar_bitmask
    xgr.apply_token_bitmask_inplace(logits, grammar_bitmask, indices=index_tensor)
  File "/usr/local/lib/python3.12/dist-packages/xgrammar/matcher.py", line 147, in apply_token_bitmask_inplace
    apply_token_bitmask_inplace_triton(logits, bitmask, vocab_size, indices)
  File "/usr/local/lib/python3.12/dist-packages/xgrammar/kernels/apply_token_bitmask_inplace_triton.py", line 106, in apply_token_bitmask_inplace_triton
    apply_token_bitmask_inplace_kernel[grid](
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 393, in <lambda>
    return lambda *args, **kwargs: self.run(grid=grid, warmup=False, *args, **kwargs)
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/runtime/jit.py", line 623, in run
    kernel.run(grid_0, grid_1, grid_2, stream, kernel.function, kernel.packed_metadata, launch_metadata,
    ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 467, in __getattribute__
    self._init_handles()
  File "/usr/local/lib/python3.12/dist-packages/triton/compiler/compiler.py", line 461, in _init_handles
    raise OutOfResources(self.metadata.num_warps * warp_size, self.n_max_threads, "threads")
triton.runtime.errors.OutOfResources: out of resource: threads, Required: 2048, Hardware limit: 1024. Reducing block sizes or `num_stages` may help.

This fix resolves triton.runtime.errors.OutOfResources error on AMD GPUs (mi300)
Copilot AI review requested due to automatic review settings November 20, 2025 20:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a triton.runtime.errors.OutOfResources error that occurs on AMD GPUs (specifically MI300) by correctly setting the warp size for AMD's architecture. AMD GPUs use a warp size of 64, while NVIDIA GPUs use 32. The fix dynamically detects the GPU vendor and sets the appropriate warp size, which is then used to calculate the number of warps needed for the Triton kernel execution.

Key changes:

  • Added conditional logic to detect AMD GPUs via torch.version.hip
  • Set WARP_SIZE to 64 for AMD GPUs and 32 for NVIDIA GPUs
  • Updated num_warps calculation to use the dynamically determined WARP_SIZE

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@divakar-amd
Copy link
Author

@Ubospica @mgorny Looking for a review.

Signed-off-by: Divakar Verma <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant