feat: add array_first higher-order array function#23267
Open
EdsonPetry wants to merge 1 commit into
Open
Conversation
Add `array_first(array, predicate)`, a higher-order function that returns the first element of an array for which the lambda predicate returns true. It returns null when the array is empty or no element matches; a predicate that returns null for an element is treated as not matching, and a matched null element is returned as null. Implemented as a `HigherOrderUDFImpl` alongside the existing array lambda functions (`array_any_match`, `array_filter`, `array_transform`), with `list_first` as an alias. Includes unit tests, sqllogictest coverage, and generated documentation.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
DataFusion already provides higher-order array functions such as
array_any_match,array_filter, andarray_transform, but there is no direct way to retrieve the first element of an array that satisfies a predicate. Today this requiresarray_filterfollowed byarray_element(..., 1), which materializes an intermediate filtered array.array_firstexpresses this directly and rounds out the set of lambda-based array functions.What changes are included in this PR?
array_first(array, predicate)(aliaslist_first) indatafusion-functions-nested, returning the first element for which the lambda predicate returnstrue:nullwhen the array is empty or no element matches;nullfor an element is treated as not matching;nullis returned asnull.HigherOrderUDFImplfollowing the existing array-lambda functions, including the standard fast paths (fully-null input) and correct handling of sliced lists, null sublists, and captured outer columns.functions-nested(expr_fnre-export and the default higher-order function list).Are these changes tested?
Yes:
array_first.rscovering match/no-match, empty and null arrays, null-predicate handling, matched-null elements, sliced lists, captured outer columns, and non-primitive element types.test_files/array/array_first.slt, includingLargeListand thelist_firstalias.Are there any user-facing changes?
Yes. A new array function
array_first(aliaslist_first) is available in SQL, with generated documentation under the Array Functions section. There are no breaking changes to existing public APIs.