Search before asking
What would you like to be improved?
Currently, the planning phase suffers from excessive memory consumption when dealing with tables containing massive deleted small files. The root cause lies in the inefficient storage of file relationships: a single DeleteFile is often associated with multiple DataFiles.
In the current implementation, these associations are likely stored as explicit lists or object references. When a table has a large volume of data files referencing the same delete files, the memory overhead for maintaining these references grows unboundedly. This redundancy causes the planning index to consume significantly more heap memory than necessary, leading to potential Out-Of-Memory (OOM) errors and degraded performance during query planning.
How should we improve?
I propose optimizing the memory layout of the planning index by introducing RoaringBitmap to compress the association between DeleteFile and DataFile. Instead of storing explicit lists of file IDs or object references, we can use RoaringBitmaps to represent the set of DataFile IDs associated with each DeleteFile. RoaringBitmap provides highly efficient compression for integer sets (file IDs), significantly reducing the memory footprint required to store these many-to-many relationships.
Are you willing to submit PR?
Subtasks
No response
Code of Conduct
Search before asking
What would you like to be improved?
Currently, the planning phase suffers from excessive memory consumption when dealing with tables containing massive deleted small files. The root cause lies in the inefficient storage of file relationships: a single DeleteFile is often associated with multiple DataFiles.
In the current implementation, these associations are likely stored as explicit lists or object references. When a table has a large volume of data files referencing the same delete files, the memory overhead for maintaining these references grows unboundedly. This redundancy causes the planning index to consume significantly more heap memory than necessary, leading to potential Out-Of-Memory (OOM) errors and degraded performance during query planning.
How should we improve?
I propose optimizing the memory layout of the planning index by introducing RoaringBitmap to compress the association between DeleteFile and DataFile. Instead of storing explicit lists of file IDs or object references, we can use RoaringBitmaps to represent the set of DataFile IDs associated with each DeleteFile. RoaringBitmap provides highly efficient compression for integer sets (file IDs), significantly reducing the memory footprint required to store these many-to-many relationships.
Are you willing to submit PR?
Subtasks
No response
Code of Conduct