Fix unreachable else branch in gather bitmask logic#21946
Fix unreachable else branch in gather bitmask logic#21946eternallyproud wants to merge 2 commits intorapidsai:mainfrom
Conversation
5c939b7 to
6a59736
Compare
|
Would it be possible to add a test for this? |
39c804f to
8a12095
Compare
|
Added a test that exercises the set_all_valid_null_masks path (nullable columns with no actual nulls + DONT_CHECK policy). Note that this bug doesn't affect correctness — gather_bitmask produces the same result in this case — but the fix avoids unnecessary bitmask computation by taking the lighter set_all_valid_null_masks path when possible. |
wence-
left a comment
There was a problem hiding this comment.
I think the change to the logic looks good, but I have a question about whether this whole function is doing more work than it strictly needs to.
| auto needs_new_bitmask = bounds_policy == out_of_bounds_policy::NULLIFY || | ||
| cudf::has_nested_nullable_columns(source_table); | ||
| auto const needs_new_bitmask = bounds_policy == out_of_bounds_policy::NULLIFY || | ||
| cudf::has_nested_nullable_columns(source_table); | ||
| if (needs_new_bitmask) { | ||
| needs_new_bitmask = needs_new_bitmask || cudf::has_nested_nulls(source_table); | ||
| if (needs_new_bitmask) { | ||
| auto const has_possible_nulls = | ||
| bounds_policy == out_of_bounds_policy::NULLIFY || cudf::has_nested_nulls(source_table); | ||
| if (has_possible_nulls) { | ||
| auto const op = bounds_policy == out_of_bounds_policy::NULLIFY | ||
| ? gather_bitmask_op::NULLIFY | ||
| : gather_bitmask_op::DONT_CHECK; | ||
| gather_bitmask(source_table, gather_map_begin, destination_columns, op, stream, mr); | ||
| } else { | ||
| for (size_type i = 0; i < source_table.num_columns(); ++i) { | ||
| set_all_valid_null_masks(source_table.column(i), *destination_columns[i], stream, mr); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
I think this change to the logic makes sense, but I am somewhat confused by it (and partially the original logic).
If we are to nullify out of bounds map indices, I can see that we need a new bitmask for each column.
Similarly, if a column has nulls (i.e. its null count is greater than zero) then we need to gather the null mask appropriately.
However, if we are not nullifying out of bounds indices and no column has nulls, why do we care if that column has a null mask (which is what cudf::has_nested_nullable_columns is checking). The gathered columns will also have no nulls, so it seems like we can get away with not even creating null masks for them.
IOW, why is this entire logic not:
auto const needs_new_bitmask = bounds_policy == NULLIFY || cudf::has_nested_nulls(source_table);
if (needs_new_bitmask) {
gather_bitmask(...);
}
// no need to set valid null masks at all, by construction the result cannot have any nulls.
Description
Fix unreachable
elsebranch ingather()bitmask logic.In the original code,
needs_new_bitmaskis alreadytruewhen enteringthe outer
ifblock, so the reassignmentneeds_new_bitmask = needs_new_bitmask || cudf::has_nested_nulls(source_table)is always
true(short-circuit), making theelsebranch that callsset_all_valid_null_masksunreachable.This PR introduces a separate variable
has_possible_nullsto correctlydistinguish between:
gather_bitmaskset_all_valid_null_masksChecklist