Add API to count number of deleted rows across deletion vector(s)#21963
Add API to count number of deleted rows across deletion vector(s)#21963mhaseeb123 wants to merge 15 commits intorapidsai:mainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
|
||
| namespace cudf::io::parquet::experimental { | ||
|
|
||
| // Type alias for the cuco 64-bit roaring bitmap |
There was a problem hiding this comment.
All this is moved to deletion_vectors_helpers.hpp/cu as is to declutter this file a bit
|
|
||
| namespace cudf::io::parquet::experimental { | ||
|
|
||
| void prepend_index_column_to_table_metadata(table_metadata& metadata) |
There was a problem hiding this comment.
Stuff moved from the anonymous section of deletion_vectors.cu as is
| cudf::host_span<size_type const> rows_per_deletion_vector, | ||
| OutputIterator output, | ||
| rmm::cuda_stream_view stream) | ||
| { |
There was a problem hiding this comment.
Refactored into a common function used from both compute_row_mask_column (old) and compute_deleted_row_count (new)
| std::queue<chunked_parquet_reader::roaring_bitmap_impl>& deletion_vectors, | ||
| std::queue<size_type>& deletion_vector_row_counts, | ||
| rmm::cuda_stream_view stream) | ||
| { |
There was a problem hiding this comment.
Similarly refactored into a common function called from compute_partial_row_mask_column (old function) and compute_partial_deleted_row_count (new function)
| /** | ||
| * @brief Opaque wrapper class for cuco's 64-bit roaring bitmap | ||
| */ | ||
| struct chunked_parquet_reader::roaring_bitmap_impl { |
| /** | ||
| * @copydoc cudf::io::parquet::experimental::compute_num_deleted_rows | ||
| */ | ||
| [[nodiscard]] size_t compute_num_deleted_rows(deletion_vector_info const& deletion_vector_info, |
There was a problem hiding this comment.
This is a new function that you should review.
Co-authored-by: David Wendt <45795991+davidwendt@users.noreply.github.com>
Description
Author's note: This PR is really tiny (most of it is just tests). The line count you see on your top right is simply from a refactor (I split a large file into two - with a new header file) and deduplicated some code. Please see my inline comments to skip over code that is moved as is. Thanks!
Closes #21937
This PR adds a new API to count the number of deleted rows across input deletion vectors using the specified index column information.
Checklist