Skip to content

[FEA] Examine performance impact of nullmask computation in JIT transforms #20680

@lamarrr

Description

@lamarrr

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish I could use cuDF to do [...]

#20206 introduced null-aware UDFs that can conditionally produce null values.
To prevent excessive synchronization across threads, I opted to use a boolean null-mask intermediate to set the nullness of the rows, thereby avoiding contention when using atomics (see: #20206 (comment)).

In a kernel fusion experiment, I observed that using a fused valid_if kernel along with a transform to set the nullmask caused a 20% slowdown compared to using a bitmask_and to compute the nullmask. valid_if kernels are also not work-efficient due to synchronization and work repetition .

Describe the solution you'd like
A clear and concise description of what you want to happen.

  • Benchmark and evaluate the approaches to setting the null masks from the transform kernel
  • Explore other ways to set the null masks efficiently, i.e., by batching the bits

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

  • Require that the UDF output is either always valid or null-propagated (bitmask-and)

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.
#20206 (comment)

Metadata

Metadata

Assignees

Labels

PerformancePerformance related issuefeature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions