Productizing Partition Evolution in OLake #221
rkhameshra
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hey folks 👋
We're exploring a new feature for OLake called "Partition strategy" and we’d love to get early feedback from the community before we start implementation.
🧠 What is "Partition strategy"?
"Partition strategy" is a way to let users define custom partitioning strategies that evolve over time or based on column values. This helps align partitioning with how data is queried, unlocking significant performance gains.
🛠️ Example Use Cases
⏱️ Time-based Evolution:
"Partition all data older than 2024 year by year, the data in 2024 by month, and data after Jan 1 by week."
This supports use cases where older data is queried less frequently (e.g., aggregate reports) while recent data is accessed with high granularity (e.g., logs, events).
🌍 Value-based Evolution:
"Partition by country_id such that there is one partition for US, and another for all other countries."
This lets users isolate high-volume or high-query-frequency keys, while grouping the rest more compactly.
🧪 V0: Static Rules
In the first version, users will be able to define static partitioning rules at the time of table creation or ingestion config. For example:
partition_strategy:
or
partition_strategy:
These rules will be compiled into static partition specs during ingestion.
🚀 Future Roadmap (Ideas for V1+)
Beta Was this translation helpful? Give feedback.
All reactions