Skip to content

Upgrade 52.4#38

Merged
zhuqi-lucas merged 13 commits intobranch-52from
upgrade-52.4
Mar 24, 2026
Merged

Upgrade 52.4#38
zhuqi-lucas merged 13 commits intobranch-52from
upgrade-52.4

Conversation

@zhuqi-lucas
Copy link
Copy Markdown
Collaborator

Upgrade 52.4 for our internal branch 52

alamb and others added 13 commits March 12, 2026 06:56
…pache#20878)

- Part of apache#20855
- Closes apache#19947 on branch-52

This PR:
- Backports apache#19948 from
@Jefffrey to the branch-52 line

Co-authored-by: Jeffrey Vo <jeffrey.vo.australia@gmail.com>
…Unions (apache#20146) (apache#20879)

- Part of apache#20855
- Closes apache#20123 on branch-52

This PR:
- Backports apache#20146 from
@nuno-faria to the branch-52 line

---------

Co-authored-by: Nuno Faria <nunofpfaria@gmail.com>
…t inner filter proves zero selectivity (apache#20743) (apache#20880)

- Part of apache#20855
- Closes apache#20742 on branch-52

This PR:
- Backports apache#20743 from
@haohuaijin to the branch-52 line

Co-authored-by: Huaijin <haohuaijin@gmail.com>
…) queries (apache#20710) (apache#20881)

- Part of apache#20855
- Closes apache#20669 on branch-52

This PR:
- Backports apache#20710 from
@jonathanc-n to the branch-52 line

Co-authored-by: Jonathan Chen <chenleejonathan@gmail.com>
…egates (apache#20279) (apache#20877)

- Part of apache#20855
- Closes apache#20267 on branch-52

This PR:
- Backports apache#20279 from
@notashes to the branch-52 line

Co-authored-by: notashes <edgerunnergit@riseup.net>
Co-authored-by: Adrian Garcia Badaracco <1755071+adriangb@users.noreply.github.com>
…ache#20724) (apache#20858) (apache#20917)

- Part of apache#20855
- Closes apache#20724 on branch-52

This PR:
- Backports apache#20858 from
@gboucher90 to the branch-52 line

Co-authored-by: gboucher90 <gboucher90@users.noreply.github.com>
…filter (apache#20231) (apache#20931)

- Part of apache#19692
- Closes apache#20194 on branch-52

This PR:
- Backports apache#20231 from
@EeshanBembi to the branch-52 line

---------

Co-authored-by: EeshanBembi <33062610+EeshanBembi@users.noreply.github.com>
)

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123. -->

- Closes #.

```
Crate:     generational-arena
Version:   0.2.9
Warning:   unmaintained
Title:     `generational-arena` is unmaintained
Date:      2024-02-11
ID:        RUSTSEC-2024-0014
URL:       https://rustsec.org/advisories/RUSTSEC-2024-0014
```

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes. -->

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
…che#20962) (apache#20997)

- Part of apache#20855
- Closes apache#20997 on branch-52

This PR:
- Backports apache#20962 from
@erratic-pattern to the branch-52 line
- Backports the related tests from
apache#20960

Co-authored-by: Adam Curtis <adam.curtis.dev@gmail.com>
…ache#21009)

## Which issue does this PR close?

- part of apache#20855

## Rationale for this change

`cargo audit` is failing on on branch-52 like this:

```
...
Crate:     lz4_flex
Version:   0.12.0
Warning:   yanked

error: 2 vulnerabilities found!
warning: 4 allowed warnings found
```

here is an example of that heppening on CI:
https://github.com/apache/datafusion/actions/runs/23209529148/job/67454157529?pr=21004



## What changes are included in this PR?


- Update lz4_flex 50 0.12.1 (non yanked)

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Upgrade internal branch 52 to DataFusion 52.4.0, including version bumps, release documentation, upstream fixes, and added regression coverage.

Changes:

  • Bump workspace/package versions to 52.4.0 and update associated docs/changelog.
  • Pull in multiple engine fixes (joins, filter/statistics analysis, array_sort nullability, dynamic filter behavior, dictionary IN-list handling).
  • Add regression tests across Substrait + sqllogictest + Rust unit/integration tests.

Reviewed changes

Copilot reviewed 23 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/source/user-guide/configs.md Update documented created_by default version to 52.4.0
dev/changelog/52.4.0.md Add 52.4.0 changelog for branch-52
datafusion/substrait/tests/testdata/test_plans/duplicate_name_in_union.substrait.json Add Substrait test plan fixture for union name handling
datafusion/substrait/tests/cases/logical_plans.rs Add snapshot + execution regression test for duplicate names in union
datafusion/sqllogictest/test_files/window.slt Add regression SLT for window ORDER BY + NVL filter sanity-check failure
datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt Add regressions for struct-schema pushdown and dictionary dynamic filter pushdown
datafusion/sqllogictest/test_files/joins.slt Add regression for right semi/anti count(*) row counts
datafusion/sqllogictest/test_files/dynamic_filter_pushdown_config.slt Add regression ensuring no dynamic filter for mixed aggregates
datafusion/sqllogictest/test_files/array.slt Add regression ensuring array_sort preserves inner list nullability
datafusion/physical-plan/src/joins/utils.rs Fix empty-schema row_count handling for right semi/anti; plumb join_type through filter/batch build
datafusion/physical-plan/src/joins/symmetric_hash_join.rs Pass join_type through join filter + batch build helpers
datafusion/physical-plan/src/joins/sort_merge_join/stream.rs Cache num_output_rows to avoid O(n) recount; refactor append capacity logic
datafusion/physical-plan/src/joins/hash_join/stream.rs Pass join_type through join filter + batch build helpers
datafusion/physical-plan/src/filter.rs Fix constant/equivalence extraction; preserve typed null stats for zero-selectivity; add regressions
datafusion/physical-plan/src/aggregates/row_hash.rs Recreate group_values after spill merge for multi-column grouping order correctness
datafusion/physical-plan/src/aggregates/mod.rs Disable dynamic filter init when aggregates aren’t all min/max (avoid mixed-agg behavior)
datafusion/physical-expr/src/expressions/in_list.rs Fix dictionary-needle unwrapping logic; add extensive regression tests
datafusion/optimizer/tests/optimizer_integration.rs Update plan snapshots for changed projection aliasing behavior
datafusion/functions-nested/src/sort.rs Preserve list inner nullability via return type + null buffer handling in array_sort
datafusion/expr/src/expr_rewriter/mod.rs Adjust coercion/aliasing to avoid unintended qualified names
datafusion/core/tests/physical_optimizer/partition_statistics.rs Update expected typed-null stats in tests
Cargo.toml Bump workspace and crate versions to 52.4.0
Cargo.lock Update locked versions + dependency bumps (lz4_flex, quinn-proto, etc.)
.github/workflows/audit.yml Ignore two RUSTSEC advisories in cargo-audit workflow

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1168 to 1172
self.streamed_batch.append_output_pair(
scanning_batch_idx,
scanning_idx,
1,
0,
self.batch_size,
);
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the join_buffered == false path, append_output_pair is called once per streamed row, but it now passes self.batch_size. This makes append_output_pair allocate UInt64Builder/UInt64Builder with capacity ~batch_size for each row (since output_indices is empty and num_output_rows is 0), which can be a significant memory/perf regression. Consider passing 1 here (or otherwise sizing capacity to the expected number of appended pairs in this branch) to avoid large per-row allocations.

Copilot uses AI. Check for mistakes.
@zhuqi-lucas zhuqi-lucas merged commit cd6aaa6 into branch-52 Mar 24, 2026
68 of 69 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants