Skip to content

[BUG] Use size_t as extent type in joins instead of int32_t #20584

@matal-nvidia

Description

@matal-nvidia

cuDF currently uses cudf::size_type (int32_t) as extent type for filtered-join and distinct-key-join.
Since a cudf::table can store a maximum of 2.1 Billion rows, and with the default load-factor of 0.5, the hash-table size can exceed the maximum representable value of int32_t. We should instead use size_t as the extent type.

Moreover before 25.10, left-semi and left-anti join used size_t as the extent-type, but this was changed when cudf moved to the new filtered-join.
Also, hash-join uses size_t as the extent type.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinglibcudfAffects libcudf (C++/CUDA) code.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions