-
Notifications
You must be signed in to change notification settings - Fork 25
[PoC] Push simple filter conditions into $lookup stage. #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -129,25 +129,12 @@ def extra_where(self, compiler, connection): # noqa: ARG001 | |
raise NotSupportedError("QuerySet.extra() is not supported on MongoDB.") | ||
|
||
|
||
def join(self, compiler, connection): | ||
lookup_pipeline = [] | ||
lhs_fields = [] | ||
rhs_fields = [] | ||
# Add a join condition for each pair of joining fields. | ||
parent_template = "parent__field__" | ||
for lhs, rhs in self.join_fields: | ||
lhs, rhs = connection.ops.prepare_join_on_clause( | ||
self.parent_alias, lhs, compiler.collection_name, rhs | ||
) | ||
lhs_fields.append(lhs.as_mql(compiler, connection)) | ||
# In the lookup stage, the reference to this column doesn't include | ||
# the collection name. | ||
rhs_fields.append(rhs.as_mql(compiler, connection)) | ||
# Handle any join conditions besides matching field pairs. | ||
extra = self.join_field.get_extra_restriction(self.table_alias, self.parent_alias) | ||
if extra: | ||
def join(self, compiler, connection, pushed_expressions=None): | ||
def _get_reroot_replacements(expressions): | ||
if not expressions: | ||
return None | ||
columns = [] | ||
for expr in extra.leaves(): | ||
for expr in expressions: | ||
# Determine whether the column needs to be transformed or rerouted | ||
# as part of the subquery. | ||
for hand_side in ["lhs", "rhs"]: | ||
|
@@ -165,18 +152,47 @@ def join(self, compiler, connection): | |
# based on their rerouted positions in the join pipeline. | ||
replacements = {} | ||
for col, parent_pos in columns: | ||
column_target = Col(compiler.collection_name, expr.output_field.__class__()) | ||
target = col.target.clone() | ||
target.remote_field = col.target.remote_field | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this not cloned from There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would like to answer yes, but the ManyToMany field isn't well cloned, because of some initialization, so I have to copy the remote_field. |
||
column_target = Col(compiler.collection_name, target) | ||
if parent_pos is not None: | ||
target_col = f"${parent_template}{parent_pos}" | ||
column_target.target.db_column = target_col | ||
column_target.target.set_attributes_from_name(target_col) | ||
else: | ||
column_target.target = col.target | ||
replacements[col] = column_target | ||
# Apply the transformed expressions in the extra condition. | ||
return replacements | ||
|
||
lookup_pipeline = [] | ||
lhs_fields = [] | ||
rhs_fields = [] | ||
# Add a join condition for each pair of joining fields. | ||
parent_template = "parent__field__" | ||
for lhs, rhs in self.join_fields: | ||
lhs, rhs = connection.ops.prepare_join_on_clause( | ||
self.parent_alias, lhs, compiler.collection_name, rhs | ||
) | ||
lhs_fields.append(lhs.as_mql(compiler, connection)) | ||
# In the lookup stage, the reference to this column doesn't include | ||
# the collection name. | ||
rhs_fields.append(rhs.as_mql(compiler, connection)) | ||
# Handle any join conditions besides matching field pairs. | ||
Comment on lines
+167
to
+180
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this the portion of code that generates the code shown below? {
$lookup: {
...
$pipeline: [
...
{ $and: [
...,
{$eq: ['$$parent_field_0', '$_id']} # This line?
]
}
]
}
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, it is the same code as before, I only created a function because the process applied to |
||
extra = self.join_field.get_extra_restriction(self.table_alias, self.parent_alias) | ||
|
||
if extra: | ||
replacements = _get_reroot_replacements(extra.leaves()) | ||
extra_condition = [extra.replace_expressions(replacements).as_mql(compiler, connection)] | ||
else: | ||
extra_condition = [] | ||
Comment on lines
+183
to
187
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm assuming these are the expressions that we aren't able to easily convert yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, that was old code, some lookups have extra conditions and others haven't , I did nothing new with |
||
if self.join_type == INNER: | ||
rerooted_replacement = _get_reroot_replacements(pushed_expressions) | ||
resolved_pushed_expressions = [ | ||
expr.replace_expressions(rerooted_replacement).as_mql(compiler, connection) | ||
for expr in pushed_expressions | ||
] | ||
else: | ||
resolved_pushed_expressions = [] | ||
|
||
lookup_pipeline = [ | ||
{ | ||
|
@@ -204,6 +220,7 @@ def join(self, compiler, connection): | |
for i, field in enumerate(rhs_fields) | ||
] | ||
+ extra_condition | ||
+ resolved_pushed_expressions | ||
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of scope for this PR, but we can definitely have two options:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the idea.
Now the conditions are pushed up (not sure if that's the right term, but I mean passing the filter into the
$lookup
) only if the Django lookup looks like one of the following:Column operator value
Column operator Column
← this one might actually be broken in the current code 😬What the code does now is:
It filters out all conditions that involve
having
, subqueries, or composed conditions likeA and (B or C)
.Why?
For subqueries, we need to promote or isolate certain
$lookup
s in the pipeline.For composed conditions, we need to analyze the expression tree to determine which parts can be pushed up. For example, in
A and (B or C)
, ifA
,B
, andC
are atomic andB
andC
can filter the outer collection, then we can push up the filterB or C
.