Skip to content

[BugFix][Relax] Select target-specific pipeline in tvm.compile when GPU target is provided#19384

Merged
tlopex merged 3 commits intoapache:mainfrom
swjng:fix/relax-compile-default-pipeline
Apr 11, 2026
Merged

[BugFix][Relax] Select target-specific pipeline in tvm.compile when GPU target is provided#19384
tlopex merged 3 commits intoapache:mainfrom
swjng:fix/relax-compile-default-pipeline

Conversation

@swjng
Copy link
Copy Markdown
Contributor

@swjng swjng commented Apr 10, 2026

Problem

relax.build() (exposed as tvm.compile) with relax_pipeline="default" always
resolved to default_build_pipeline, regardless of the target.
default_build_pipeline does not include DLight scheduling — it is a
target-agnostic lowering pipeline. On CUDA, this left TIR functions generated
from ops like Clip/ReLU6 without thread bindings, causing VerifyMemory to fail:

Memory verification failed: Variable `X` is directly accessed by host memory
(it is not contained in a thread environment or in the function arguments).
Did you forget to bind?

Fix

When relax_pipeline="default" and the target is a GPU target
("gpu" in target.keys), use relax.get_default_pipeline(target) which
includes target-aware DLight scheduling. Fall back to default_build_pipeline
if no target-specific pipeline is registered.

CPU targets (llvm, c) continue to use default_build_pipeline unchanged.
The CPU-specific pipeline adds FuseOps/FuseTIR/FoldConstant on top,
which can DCE call_pure_packed calls whose results are unused — correct
per pure semantics, but a separate concern from this fix.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the Relax VM build process to prefer target-specific default pipelines when the "default" pipeline string is provided and a target is available. A review comment correctly identified that the implementation incorrectly attempts to access the pipeline submodule directly, which would likely trigger an AttributeError and cause an unintended fallback to the generic pipeline; a suggestion was provided to use the properly exported relax.get_default_pipeline function instead.

Comment thread python/tvm/relax/vm_build.py Outdated
…arget is provided

relax.build() with relax_pipeline="default" always resolved to
default_build_pipeline, which omits FuseOps, FuseTIR, and DLight
scheduling. On CUDA this left individual TIR functions (e.g. maximum,
minimum from Clip/ReLU6) without thread bindings, causing VerifyMemory
to fail:

  Memory verification failed: Variable X is directly accessed by host
  memory (it is not contained in a thread environment or in the
  function arguments).

When relax_pipeline="default" and a target is provided, prefer
relax.pipeline.get_default_pipeline(target), which includes the full
legalization + fusion + DLight scheduling pipeline. Falls back to
default_build_pipeline if no target-specific pipeline is registered
(e.g. ValueError or AttributeError from get_default_pipeline).
@swjng swjng force-pushed the fix/relax-compile-default-pipeline branch from e715582 to 3203e19 Compare April 10, 2026 15:17
@swjng swjng changed the title [BugFix][Relax] Select target-specific pipeline in tvm.compile when target is provided [BugFix][Relax] Select target-specific pipeline in tvm.compile when target is provided Apr 10, 2026
…pipeline

`cpu_generic.get_default_pipeline` was missing `DispatchSampling` and
`DispatchSortScan` from its `library_dispatch_passes`, causing ops like
`relax.cumsum` and `relax.topk` to reach CodeGenVM without being
dispatched, resulting in "CodeGenVM cannot handle this intrinsic" errors
on CPU/llvm targets.
@swjng swjng force-pushed the fix/relax-compile-default-pipeline branch from 32d3457 to e6d872a Compare April 11, 2026 03:17
The previous fix applied get_default_pipeline(target) whenever a target
was provided, including CPU (llvm). The CPU-specific pipeline includes
FoldConstant and FuseOps/FuseTIR which DCE unused call_pure_packed
calls -- correct per the pure semantics, but it broke existing tests
that relied on their side effects.

Narrow the scope: only use get_default_pipeline for GPU targets
(identified by 'gpu' in target.keys). CPU targets continue to use
get_pipeline('default'), which is the previous behaviour.
@swjng swjng changed the title [BugFix][Relax] Select target-specific pipeline in tvm.compile when target is provided [BugFix][Relax] Select target-specific pipeline in tvm.compile when GPU target is provided Apr 11, 2026
@swjng
Copy link
Copy Markdown
Contributor Author

swjng commented Apr 11, 2026

While investigating the CPU test failures, I noticed that test_vm_compile_simple uses R.call_pure_packed for test.vm.identity, which modifies y's buffer in-place and relies on that side effect:

z = R.call_pure_packed(
    "test.vm.identity", x, y, sinfo_args=(R.Tensor(ndim=2, dtype="float32"))
)
return y  # z unused — relies on y being modified as a side effect

call_pure_packed declares no side effects, so passes like FoldConstant are correct to DCE the call when z is unused. The test should use R.call_packed instead. This didn't surface before because default_build_pipeline doesn't include DCE-capable passes.

I'll send a follow-up PR to fix the test after this merges.

Copy link
Copy Markdown
Member

@tlopex tlopex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tlopex tlopex merged commit b9ced1a into apache:main Apr 11, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants