-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Support simplify not for physical expr #18970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Thanks for the review @martin-g |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @xudong963 and @martin-g
I think prior to releasing this code we should make the following changes:
- Avoid explicit recursion in not simplification (and instead use the tree node API)
- Avoid the potential infinite loop
It would be nice to make them as part of this PR, but I also think it would be ok to do it as a follow on
Other things I think would be good to do but not strictly necessary:
- Move the tests to the public interface (PhysicalExprSimplifier)
- Avoid duplications of
Operator::negate
| ); | ||
| Ok(unwrapped) | ||
| // Combine transformation results | ||
| let final_transformed = transformed || unwrapped.transformed; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use transform_data here instead: https://docs.rs/datafusion/latest/datafusion/common/tree_node/struct.Transformed.html#method.transform_data
So something like
// Apply NOT expression simplification first
let rewritten =
simplify_not_expr(&node, self.schema)?.transform_data(|node| {
unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
})?;That handles combining the transformed flag for you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat, done in be077db.
Need to catch up on the tree node APIs, haha
|
|
||
| // Apply unwrap cast optimization | ||
| #[cfg(test)] | ||
| let original_type = node.data_type(self.schema).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to move the original type check before applying the not and the verification after the not
Somethg like
impl<'a> TreeNodeRewriter for PhysicalExprSimplifier<'a> {
type Node = Arc<dyn PhysicalExpr>;
fn f_up(&mut self, node: Self::Node) -> Result<Transformed<Self::Node>> {
#[cfg(test)]
let original_type = node.data_type(self.schema).unwrap();
// Apply NOT expression simplification first
let rewritten =
simplify_not_expr(&node, self.schema)?.transform_data(|node| {
unwrap_cast::unwrap_cast_in_comparison(node, self.schema)
})?;
#[cfg(test)]
assert_eq!(
rewritten.data.data_type(self.schema).unwrap(),
original_type,
"Simplified expression should have the same data type as the original"
);
Ok(rewritten)
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in be077db
| ]) | ||
| } | ||
|
|
||
| #[test] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this function is not really run in isolation (it is always run as part of PhysicalExprSimplifier) I think it would be better if these tests were in datafusion/physical-expr/src/simplifier/mod.rs rather than here.
I don't think this change is required
| } | ||
|
|
||
| /// Returns the negated version of a comparison operator, if possible | ||
| fn negate_operator(op: &Operator) -> Option<Operator> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks the same as Operator::negate: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Operator.html#method.negate
I recommend using that code, or if there is a reason not to use Operator::negate add a comment explaining why it is not used
| Arc::new(NotExpr::new(Arc::clone(binary_expr.right()))); | ||
|
|
||
| // Recursively simplify the NOT expressions | ||
| let simplified_left = simplify_not_expr(¬_left, schema)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since simplify_not_expr is used in PhysicalExprSimplifier which is doing a walk up the plan (using f_up) there is no reason to also recurse explicitly in this rule.
You should be able to apply the rewrite rules only NOT(A OR B) --> NOT A AND NOT B rather than also changing the exprs
| let mut current_expr = Arc::clone(expr); | ||
| let mut overall_transformed = false; | ||
|
|
||
| loop { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also somewhat concerned about this loop -- it seems like it may be trying to handle a non complete recursion and it could potentially lead to infinite recursion if a rewrite flip/flopped
If the loop is needed I suggest:
- Put it in the higher level PhysicalExprSimplifier
- add a counter that breaks after some number of iterations (e.g. 5 or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put it into PhysicalExprSimplifier with a counter to limit the loop
|
|
||
| assert!(result.transformed); | ||
| // Should be simplified back to the original b > 5 | ||
| assert_eq!(result.data.to_string(), inner_expr.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for using to_string()?
I tried comparing the two directly and it seems to work
assert_eq!(&result.data, &inner_expr);There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I didn't notice the expr is comparable
| assert!(result.transformed); | ||
| // Should be simplified back to the original b > 5 | ||
| assert_eq!(result.data.to_string(), inner_expr.to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also make these queries easier to read if you made a helper like
fn assert_transformed(expr, expected_expr);and
fn assert_not_transformed(expr, expected_expr);| if let Some(literal) = result.data.as_any().downcast_ref::<Literal>() { | ||
| assert_eq!(literal.value(), &ScalarValue::Boolean(Some(false))); | ||
| } else { | ||
| panic!("Expected literal result"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could potentially make this a function to make the tests easier to read - something like
let literal = as_literal(&result);And handle the panic internally in as_literal
That way the mechics of the test wouldn't obscure the test logic so much
I think you could do something similar with BinaryExpr
| let schema = test_schema(); | ||
|
|
||
| // Create a deeply nested NOT expression: NOT(NOT(NOT(...NOT(b > 5)...))) | ||
| // This tests that we don't get stack overflow with many nested NOTs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
d749da7 to
be077db
Compare
e9731f3 to
27c4ef1
Compare
Thanks @alamb. I addressed all suggestions in the latest commit 27c4ef1 |
Part of the #18868