[BUGFIX][TIR] Skip wholly-constant expressions in CSE#19496
[BUGFIX][TIR] Skip wholly-constant expressions in CSE#19496tqchen wants to merge 1 commit intoapache:mainfrom
Conversation
The TIR CSE pass currently lifts expressions like Cast(int32, 1) or Min(1, 2) into a cse_v binding when they appear twice, even though they are compile-time constants and provide no CSE benefit. This pollutes the IR with bindings of the form cse_v = 1. Extend CSEPlanner::IsEligible to reject any expression whose subtree contains no Var.
There was a problem hiding this comment.
Code Review
This pull request modifies the Common Subexpression Elimination (CSE) pass to exclude wholly-constant expressions, preventing the hoisting of subtrees that do not contain variables and are better handled by the constant folder. Corresponding test cases for constant casts and min operations were added. Feedback was provided regarding the performance of the IsEligible predicate, which currently performs multiple tree traversals, potentially leading to O(N^2) complexity, and contains a redundant check.
| if (IsForbiddenNode(expr)) return false; | ||
| if (expr.as<RampNode>() || expr.as<BroadcastNode>()) return false; | ||
| if (CheckContains::ExprContains(expr, IsForbiddenNode)) return false; | ||
| // Reject wholly-constant expressions (no Var anywhere in the tree). | ||
| // BufferLoad is already filtered above by IsForbiddenNode, so | ||
| // "contains no Var" is sufficient to declare the expression a | ||
| // compile-time constant. Hoisting it adds noise; the constant | ||
| // folder will collapse it. | ||
| auto contains_var = [](const PrimExpr& e) { return e.as<VarNode>() != nullptr; }; | ||
| if (!CheckContains::ExprContains(expr, contains_var)) return false; |
There was a problem hiding this comment.
The IsEligible predicate now performs two separate full traversals of the expression tree (lines 284 and 291) for every node visited during the CSE planning phase. This leads to
Furthermore, the check at line 282 is redundant because CheckContains::ExprContains at line 284 already checks the root node. Additionally, since IsEligible is only invoked from RecordExpr within specific VisitExpr_ overrides (which exclude Call and BufferLoad), the root-level check at line 282 will always be false.
Consider combining the 'forbidden node' and 'contains var' checks into a single pass or, ideally, caching these properties during the bottom-up traversal of CSEPlanner to maintain linear complexity.
The TIR CSE pass currently lifts wholly-constant expressions like
Cast(int32, 1)orT.min(1, 2)intocse_vbindings when theyappear more than once. These expressions are compile-time constants
that the constant folder collapses anyway, so hoisting them adds
noise of the form
cse_v = 1without providing any CSE benefit.The eligibility predicate
CSEPlanner::IsEligiblealready rejectsbare
IntImm/FloatImm/StringImm/Varat the top level, butcompound nodes with registered visitors (
Cast,Min,Max,arithmetic,
Select, ...) pass the leaf-only check while stillbeing wholly-constant. Extend the predicate with a transitive
"contains no Var" check using the existing
CheckContainshelper.BufferLoadis already filtered upstream byIsForbiddenNode, so"no Var" is sufficient at this point to declare the expression a
compile-time constant.
Adds two regression tests to
tests/python/tirx-transform/test_tir_transform_common_subexpr_elim.pycovering
Cast-of-constant andMin-of-constants.