Conversation
…pache#20878) - Part of apache#20855 - Closes apache#19947 on branch-52 This PR: - Backports apache#19948 from @Jefffrey to the branch-52 line Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
…Unions (apache#20146) (apache#20879) - Part of apache#20855 - Closes apache#20123 on branch-52 This PR: - Backports apache#20146 from @nuno-faria to the branch-52 line --------- Co-authored-by: Nuno Faria <nunofpfaria@gmail.com>
…t inner filter proves zero selectivity (apache#20743) (apache#20880) - Part of apache#20855 - Closes apache#20742 on branch-52 This PR: - Backports apache#20743 from @haohuaijin to the branch-52 line Co-authored-by: Huaijin <haohuaijin@gmail.com>
…) queries (apache#20710) (apache#20881) - Part of apache#20855 - Closes apache#20669 on branch-52 This PR: - Backports apache#20710 from @jonathanc-n to the branch-52 line Co-authored-by: Jonathan Chen <chenleejonathan@gmail.com>
…egates (apache#20279) (apache#20877) - Part of apache#20855 - Closes apache#20267 on branch-52 This PR: - Backports apache#20279 from @notashes to the branch-52 line Co-authored-by: notashes <edgerunnergit@riseup.net> Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
…ache#20724) (apache#20858) (apache#20917) - Part of apache#20855 - Closes apache#20724 on branch-52 This PR: - Backports apache#20858 from @gboucher90 to the branch-52 line Co-authored-by: gboucher90 <gboucher90@users.noreply.github.com>
…filter (apache#20231) (apache#20931) - Part of apache#19692 - Closes apache#20194 on branch-52 This PR: - Backports apache#20231 from @EeshanBembi to the branch-52 line --------- Co-authored-by: EeshanBembi <33062610+EeshanBembi@users.noreply.github.com>
) <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ``` Crate: generational-arena Version: 0.2.9 Warning: unmaintained Title: `generational-arena` is unmaintained Date: 2024-02-11 ID: RUSTSEC-2024-0014 URL: https://rustsec.org/advisories/RUSTSEC-2024-0014 ``` <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. --> ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes #. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
…che#20962) (apache#20997) - Part of apache#20855 - Closes apache#20997 on branch-52 This PR: - Backports apache#20962 from @erratic-pattern to the branch-52 line - Backports the related tests from apache#20960 Co-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
…ache#21009) ## Which issue does this PR close? - part of apache#20855 ## Rationale for this change `cargo audit` is failing on on branch-52 like this: ``` ... Crate: lz4_flex Version: 0.12.0 Warning: yanked error: 2 vulnerabilities found! warning: 4 allowed warnings found ``` here is an example of that heppening on CI: https://github.com/apache/datafusion/actions/runs/23209529148/job/67454157529?pr=21004 ## What changes are included in this PR? - Update lz4_flex 50 0.12.1 (non yanked) ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
There was a problem hiding this comment.
Pull request overview
Upgrade internal branch 52 to DataFusion 52.4.0, including version bumps, release documentation, upstream fixes, and added regression coverage.
Changes:
- Bump workspace/package versions to 52.4.0 and update associated docs/changelog.
- Pull in multiple engine fixes (joins, filter/statistics analysis, array_sort nullability, dynamic filter behavior, dictionary IN-list handling).
- Add regression tests across Substrait + sqllogictest + Rust unit/integration tests.
Reviewed changes
Copilot reviewed 23 out of 24 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/source/user-guide/configs.md | Update documented created_by default version to 52.4.0 |
| dev/changelog/52.4.0.md | Add 52.4.0 changelog for branch-52 |
| datafusion/substrait/tests/testdata/test_plans/duplicate_name_in_union.substrait.json | Add Substrait test plan fixture for union name handling |
| datafusion/substrait/tests/cases/logical_plans.rs | Add snapshot + execution regression test for duplicate names in union |
| datafusion/sqllogictest/test_files/window.slt | Add regression SLT for window ORDER BY + NVL filter sanity-check failure |
| datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt | Add regressions for struct-schema pushdown and dictionary dynamic filter pushdown |
| datafusion/sqllogictest/test_files/joins.slt | Add regression for right semi/anti count(*) row counts |
| datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt | Add regression ensuring no dynamic filter for mixed aggregates |
| datafusion/sqllogictest/test_files/array.slt | Add regression ensuring array_sort preserves inner list nullability |
| datafusion/physical-plan/src/joins/utils.rs | Fix empty-schema row_count handling for right semi/anti; plumb join_type through filter/batch build |
| datafusion/physical-plan/src/joins/symmetric_hash_join.rs | Pass join_type through join filter + batch build helpers |
| datafusion/physical-plan/src/joins/sort_merge_join/stream.rs | Cache num_output_rows to avoid O(n) recount; refactor append capacity logic |
| datafusion/physical-plan/src/joins/hash_join/stream.rs | Pass join_type through join filter + batch build helpers |
| datafusion/physical-plan/src/filter.rs | Fix constant/equivalence extraction; preserve typed null stats for zero-selectivity; add regressions |
| datafusion/physical-plan/src/aggregates/row_hash.rs | Recreate group_values after spill merge for multi-column grouping order correctness |
| datafusion/physical-plan/src/aggregates/mod.rs | Disable dynamic filter init when aggregates aren’t all min/max (avoid mixed-agg behavior) |
| datafusion/physical-expr/src/expressions/in_list.rs | Fix dictionary-needle unwrapping logic; add extensive regression tests |
| datafusion/optimizer/tests/optimizer_integration.rs | Update plan snapshots for changed projection aliasing behavior |
| datafusion/functions-nested/src/sort.rs | Preserve list inner nullability via return type + null buffer handling in array_sort |
| datafusion/expr/src/expr_rewriter/mod.rs | Adjust coercion/aliasing to avoid unintended qualified names |
| datafusion/core/tests/physical_optimizer/partition_statistics.rs | Update expected typed-null stats in tests |
| Cargo.toml | Bump workspace and crate versions to 52.4.0 |
| Cargo.lock | Update locked versions + dependency bumps (lz4_flex, quinn-proto, etc.) |
| .github/workflows/audit.yml | Ignore two RUSTSEC advisories in cargo-audit workflow |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| self.streamed_batch.append_output_pair( | ||
| scanning_batch_idx, | ||
| scanning_idx, | ||
| 1, | ||
| 0, | ||
| self.batch_size, | ||
| ); |
There was a problem hiding this comment.
In the join_buffered == false path, append_output_pair is called once per streamed row, but it now passes self.batch_size. This makes append_output_pair allocate UInt64Builder/UInt64Builder with capacity ~batch_size for each row (since output_indices is empty and num_output_rows is 0), which can be a significant memory/perf regression. Consider passing 1 here (or otherwise sizing capacity to the expected number of appended pairs in this branch) to avoid large per-row allocations.
Upgrade 52.4 for our internal branch 52