Skip to content

fix: Avoid panicing when stats are not available for a file group split#23277

Open
mkleen wants to merge 1 commit into
apache:mainfrom
mkleen:missing_statistics_partitioned
Open

fix: Avoid panicing when stats are not available for a file group split#23277
mkleen wants to merge 1 commit into
apache:mainfrom
mkleen:missing_statistics_partitioned

Conversation

@mkleen

@mkleen mkleen commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Rationale for this change

The query from the issue:

SELECT   (((Cast(id AS BIGINT) % 1024) + 1024) % 1024) AS computed_bucket
FROM     profile
ORDER BY computed_bucket,
         Cast(id AS BIGINT) limit 10;

panics:

thread 'main' panicked at .../datafusion-datasource-54.0.0/src/statistics.rs:100:48:
index out of bounds: the len is 0 but the index is 0

The underlying issue is that the current code panics when files are split by statistics and there are no statistics available for the column where the sort order is defined in this case computed_bucket.

What changes are included in this PR?

  • Fix in MinMaxStatistics to check if there are stats available for a given column
  • Test

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions Bot added the datasource Changes to the datasource crate label Jul 1, 2026
@mkleen mkleen force-pushed the missing_statistics_partitioned branch from f8c0e85 to cba5503 Compare July 1, 2026 10:26
@mkleen mkleen changed the title fix: Avoid panicing when min/max stats are not available for a group split fix: Avoid panicing when stats are not available for a group split Jul 1, 2026
@mkleen mkleen changed the title fix: Avoid panicing when stats are not available for a group split fix: avoid panicing when stats are not available for a group split Jul 1, 2026
@mkleen mkleen changed the title fix: avoid panicing when stats are not available for a group split fix: Avoid panicing when stats are not available for a group split Jul 1, 2026
@mkleen mkleen marked this pull request as ready for review July 1, 2026 10:54
@mkleen mkleen force-pushed the missing_statistics_partitioned branch from 56c9cef to 3490ba9 Compare July 1, 2026 12:41
@mkleen mkleen changed the title fix: Avoid panicing when stats are not available for a group split fix: Avoid panicing when stats are not available for a file group split Jul 1, 2026

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mkleen WDYT if its possible to add the test to SLT test as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Panic in DataFusion 54.0.0 when ordering Parquet scan by computed projection alias

2 participants