Skip to content

feat: add array_first higher-order array function#23267

Open
EdsonPetry wants to merge 1 commit into
apache:mainfrom
EdsonPetry:array-first
Open

feat: add array_first higher-order array function#23267
EdsonPetry wants to merge 1 commit into
apache:mainfrom
EdsonPetry:array-first

Conversation

@EdsonPetry

Copy link
Copy Markdown

Which issue does this PR close?

  • N/A. No dedicated tracking issue; this adds a single self-contained higher-order array function. Happy to file one if preferred.

Rationale for this change

DataFusion already provides higher-order array functions such as array_any_match, array_filter, and array_transform, but there is no direct way to retrieve the first element of an array that satisfies a predicate. Today this requires array_filter followed by array_element(..., 1), which materializes an intermediate filtered array. array_first expresses this directly and rounds out the set of lambda-based array functions.

What changes are included in this PR?

  • New higher-order function array_first(array, predicate) (alias list_first) in datafusion-functions-nested, returning the first element for which the lambda predicate returns true:
    • returns null when the array is empty or no element matches;
    • a predicate that evaluates to null for an element is treated as not matching;
    • a matched element that is itself null is returned as null.
  • Implemented as a HigherOrderUDFImpl following the existing array-lambda functions, including the standard fast paths (fully-null input) and correct handling of sliced lists, null sublists, and captured outer columns.
  • Registration in functions-nested (expr_fn re-export and the default higher-order function list).
  • Unit tests, sqllogictest coverage, and regenerated SQL function documentation.

Are these changes tested?

Yes:

  • Unit tests in array_first.rs covering match/no-match, empty and null arrays, null-predicate handling, matched-null elements, sliced lists, captured outer columns, and non-primitive element types.
  • sqllogictest cases in test_files/array/array_first.slt, including LargeList and the list_first alias.

Are there any user-facing changes?

Yes. A new array function array_first (alias list_first) is available in SQL, with generated documentation under the Array Functions section. There are no breaking changes to existing public APIs.

Add `array_first(array, predicate)`, a higher-order function that returns
the first element of an array for which the lambda predicate returns true.
It returns null when the array is empty or no element matches; a predicate
that returns null for an element is treated as not matching, and a matched
null element is returned as null.

Implemented as a `HigherOrderUDFImpl` alongside the existing array lambda
functions (`array_any_match`, `array_filter`, `array_transform`), with
`list_first` as an alias. Includes unit tests, sqllogictest coverage, and
generated documentation.
@github-actions github-actions Bot added documentation Improvements or additions to documentation sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant