Skip to content

fix(isthmus): handle subqueries with outer field references #426

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nielspardon
Copy link
Contributor

@nielspardon nielspardon commented Jun 27, 2025

changes SubstraitRelNodeConverter and ExpressionRexConverter to support handling subqueries with outer field references

fixes #382

replaces #383

@vbarua
Copy link
Member

vbarua commented Jun 27, 2025

For the sake of reviewability could I ask you split this into 3-ish PRs. One for your initial cleanups/improvements, one for just the visitor changes, and one for subquery changes.

@nielspardon
Copy link
Contributor Author

For the sake of reviewability could I ask you split this into 3-ish PRs. One for your initial cleanups/improvements, one for just the visitor changes, and one for subquery changes.

Sure, I already thought that this might be necessary.

@nielspardon
Copy link
Contributor Author

First extracted PR for the visitor changes: #427

@nielspardon nielspardon force-pushed the par-tpch-17 branch 5 times, most recently from 7b6ef99 to 764206a Compare June 27, 2025 19:24
@nielspardon
Copy link
Contributor Author

I removed the cleanup from this PR and rebased it on #427

@nielspardon nielspardon force-pushed the par-tpch-17 branch 2 times, most recently from a2f0c8e to 86545b6 Compare July 1, 2025 17:33
@nielspardon
Copy link
Contributor Author

updated to use the new VisitationContext interface from #427

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a very cursory first pass.

Something that I think would be helpful would be some tests based directly on Substrait plans, and not just the TPC based tests. This would help contributors, myself included, understand what exactly the structure of these subqueries looks like in Substrait, and how your processing is being applied on top of it.

return applyRemap(node, project.getRemap());
}

@Override
public RelNode visit(Cross cross, EmptyVisitationContext context) throws RuntimeException {
public RelNode visit(Cross cross, Context context) throws RuntimeException {
RelNode left = cross.getLeft().accept(this, context);
RelNode right = cross.getRight().accept(this, context);
// Calcite represents CROSS JOIN as the equivalent INNER JOIN with true condition
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to handle subqueries, don't we need to propagate the parents through every relation type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not entirely sure how you mean that. we need to propagate the context through every relation type but we do not need to track every relation as a parent but only those relations that can contain subqueries in their expressions.

Copy link
Member

@vbarua vbarua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a very cursory first pass.

Something that I think would be helpful would be some tests based directly on Substrait plans, and not just the TPC based tests. This would help contributors, myself included, understand what exactly the structure of these subqueries looks like in Substrait, and how your processing is being applied on top of it.

@nielspardon
Copy link
Contributor Author

nielspardon commented Jul 7, 2025

Something that I think would be helpful would be some tests based directly on Substrait plans, and not just the TPC based tests. This would help contributors, myself included, understand what exactly the structure of these subqueries looks like in Substrait, and how your processing is being applied on top of it.

I drew this diagram for myself to reason about how outer field references work in Substrait vs. Calcite:
image

  • Subqueries are an expression element so they only appear within relations that are configured using expressions e.g. FilterRel, ProjectRel, JoinRel.
  • Substrait addresses field references to the parent relation operations by a number of subquery boundaries one has to cross to find the parent relation of the referenced field which means we need to keep track of the subquery depth.
  • Calcite uses correlation variables e.g. $cor0 to address a field from a parent relation e.g. $cor0.P_PARTKEY. The correlation variable needs to be added to the variablesSet of the parent relation the field is coming from variablesSet=[$cor0]. Additionally when creating the correlation variable we need to get the row type of the parent relation providing the field. When creating the field access in the subquery we need to have access to input relation row type of the parent relation and we need to ensure we add all the correlation variables pointing to this parent relation to the variablesSet of this parent relation.
  • We do not need to track all parent relations but we just need to track the parent relations which support expressions. Like in the example above we need to know the 3 FilterRels and don't really care about the other relations in the tree.
  • Calcite only supports providing a variablesSet for Project, Filter and Join if you search for variablesSet in the Calcite RelBuilder.

I hope this helps.

@nielspardon
Copy link
Contributor Author

Something that I think would be helpful would be some tests based directly on Substrait plans, and not just the TPC based tests.

will spend some time on tests towards end of the week when I have some spare time

@nielspardon nielspardon force-pushed the par-tpch-17 branch 2 times, most recently from f574de2 to 1e09ea6 Compare July 11, 2025 06:25
protected final SubstraitToCalcite converter = new SubstraitToCalcite(extensions, typeFactory);

@Test
void testOuterFieldReferenceOneStep() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vbarua what do you think about this testing approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do have tests for scalar subqueries, in predicate and set predicate with various levels of depth. I guess the test coverage should be good enough. let me know if there's anything else I should add

@nielspardon nielspardon force-pushed the par-tpch-17 branch 7 times, most recently from 6481a08 to 8d8efb8 Compare July 11, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[isthmus] subqueries/set predicates with field references outside of the subquery fail
2 participants