Skip to content

Improve at least non-row-group wise parquet readers #128

@seberg

Description

@seberg

We should be able to improve the GPU version very easily as soon as updating to newer libcudf (25.08+), because then the Chunked reader supports row offsets, so we can use that.
(We may also be able to mix row groups and row offsets, but I am not sure, the only reason for that is really that we need the row groups for the CPU version currently.)

The CPU version is worse, because it wasn't clear to us if there is a nice approach via arrow to efficiently limit the reads. There is probably some unnecessary decompression going on here of things we don't need.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions