Skip to content

Fix unreachable else branch in gather bitmask logic#21946

Open
eternallyproud wants to merge 2 commits intorapidsai:mainfrom
eternallyproud:gather_bitmask_fix
Open

Fix unreachable else branch in gather bitmask logic#21946
eternallyproud wants to merge 2 commits intorapidsai:mainfrom
eternallyproud:gather_bitmask_fix

Conversation

@eternallyproud
Copy link
Copy Markdown

@eternallyproud eternallyproud commented Mar 30, 2026

Description

Fix unreachable else branch in gather() bitmask logic.

In the original code, needs_new_bitmask is already true when entering
the outer if block, so the reassignment
needs_new_bitmask = needs_new_bitmask || cudf::has_nested_nulls(source_table)
is always true (short-circuit), making the else branch that calls
set_all_valid_null_masks unreachable.

This PR introduces a separate variable has_possible_nulls to correctly
distinguish between:

  1. Source table has actual nulls (or NULLIFY policy) → call gather_bitmask
  2. Source table columns are nullable but contain no nulls → call set_all_valid_null_masks

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@eternallyproud eternallyproud requested a review from a team as a code owner March 30, 2026 13:45
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Mar 30, 2026
@davidwendt
Copy link
Copy Markdown
Contributor

Would it be possible to add a test for this?

@davidwendt davidwendt added bug Something isn't working 3 - Ready for Review Ready for review by team non-breaking Non-breaking change labels Mar 31, 2026
@eternallyproud
Copy link
Copy Markdown
Author

Added a test that exercises the set_all_valid_null_masks path (nullable columns with no actual nulls + DONT_CHECK policy). Note that this bug doesn't affect correctness — gather_bitmask produces the same result in this case — but the fix avoids unnecessary bitmask computation by taking the lighter set_all_valid_null_masks path when possible.

Copy link
Copy Markdown
Contributor

@wence- wence- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the change to the logic looks good, but I have a question about whether this whole function is doing more work than it strictly needs to.

Comment on lines -649 to 664
auto needs_new_bitmask = bounds_policy == out_of_bounds_policy::NULLIFY ||
cudf::has_nested_nullable_columns(source_table);
auto const needs_new_bitmask = bounds_policy == out_of_bounds_policy::NULLIFY ||
cudf::has_nested_nullable_columns(source_table);
if (needs_new_bitmask) {
needs_new_bitmask = needs_new_bitmask || cudf::has_nested_nulls(source_table);
if (needs_new_bitmask) {
auto const has_possible_nulls =
bounds_policy == out_of_bounds_policy::NULLIFY || cudf::has_nested_nulls(source_table);
if (has_possible_nulls) {
auto const op = bounds_policy == out_of_bounds_policy::NULLIFY
? gather_bitmask_op::NULLIFY
: gather_bitmask_op::DONT_CHECK;
gather_bitmask(source_table, gather_map_begin, destination_columns, op, stream, mr);
} else {
for (size_type i = 0; i < source_table.num_columns(); ++i) {
set_all_valid_null_masks(source_table.column(i), *destination_columns[i], stream, mr);
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change to the logic makes sense, but I am somewhat confused by it (and partially the original logic).

If we are to nullify out of bounds map indices, I can see that we need a new bitmask for each column.

Similarly, if a column has nulls (i.e. its null count is greater than zero) then we need to gather the null mask appropriately.

However, if we are not nullifying out of bounds indices and no column has nulls, why do we care if that column has a null mask (which is what cudf::has_nested_nullable_columns is checking). The gathered columns will also have no nulls, so it seems like we can get away with not even creating null masks for them.

IOW, why is this entire logic not:

auto const needs_new_bitmask = bounds_policy == NULLIFY || cudf::has_nested_nulls(source_table);
if (needs_new_bitmask) {
    gather_bitmask(...);
}
// no need to set valid null masks at all, by construction the result cannot have any nulls.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team bug Something isn't working libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants