Open
Description
Here's the current Project Antalya roadmap for 2025. This year the principal focus is adapting ClickHouse to use Iceberg as shared object storage and adding separation of storage and compute. All features are open source--there are no hold-backs.
Please suggest additional features and ideas in the comments to this issue. We also welcome contributions.
Performance:
- Parquet metadata cache
- Parquet File Metadata caching implementation #586
- use parquet metadata cache for parquetmetadata format as well #636
- Parquet native reader, v1
- Add parquet bloom filters support ClickHouse/ClickHouse#62966
- Support for Parquet page V2 on native reader ClickHouse/ClickHouse#70807
- Boolean support for parquet native reader ClickHouse/ClickHouse#71055
- Merge parquet bloom filter and min/max evaluation ClickHouse/ClickHouse#71383
- Support parquet integer logical types on native reader ClickHouse/ClickHouse#72105
- Parquet native reader, v3 (upstream) A new parquet reader that supports filter push down, which improves the total time on clickbench by 50+% compared to arrow parquet reader ClickHouse/ClickHouse#70611
- ListObjectsV2 cache Antalya: Cache the list objects operation on object storage using a TTL + prefix matching cache implementation #743
- Iceberg table pruning in cluster requests Prune table in icebergCluster functions #770
- Iceberg files metadata cache (upstream) Support Iceberg Metadata Files Cache ClickHouse/ClickHouse#77156
- Iceberg partition pruning (upstream) Iceberg Partition Pruning for time-related partition transforms ClickHouse/ClickHouse#72044
- Iceberg min/max pruning (upstream) Minmax iceberg ClickHouse/ClickHouse#78242
- RowGroup adaptive size
Swarms:
- Auto-discovery of swarm cluster nodes Clusters autodiscovery #629
- Consistent hashing for object distribution to improve cache locality Rendezvous hashing filesystem cache #709
- Distributed object storage table engines Distributed request to tables with Object Storage Engines #615
- Swarm query syntax Convert functions with object_storage_cluster setting to cluster functions #712
- Swarm reliability/re-tries Enable swarm queries to continue processing queries when a node is shut down #759
- Restart cluster tasks on connection lost #780
- Swarm for writes
- Swarm for merges/optimize
Catalogs:
- Open source catalog for Kubernetes https://github.com/Altinity/ice
- AWS S3 Table support
- Unity catalog support Unity catalog integration ClickHouse/ClickHouse#76988
- Glue catalog support Add glue catalog integration ClickHouse/ClickHouse#77257
- Cloudflare R2 Data Catalogs support
- Public datasets in Iceberg
- Use IAM roles to access s3 table function Role-based S3 access #688
Iceberg Writes:
- Toolkit for loading files into Iceberg https://github.com/Altinity/ice
- Support partitioning
- Support ordering (see https://www.tabular.io/apache-iceberg-cookbook/data-engineering-table-write-order/)
- CREATE TABLE for Iceberg/DataLakeCatalog database engine
- INSERT INTO Iceberg table
- Use MergeTree buffer for frequent inserts into Iceberg (like async inserts but with much bigger buffer on disk)
Tiered Storage:
- Wildcard support for object storage Add {_snowflake_id} wildcard support to object storage #789
- Add support for hive partition style reads and writes Add support for hive partition style reads and writes ClickHouse/ClickHouse#76802
- Write MergeTree parts to Parquet
- Tiered table engine Tiered table engine #815
- Antalya 25.3: Write to Merge storage #683
- TTL to other table
- Merge tables with watermark
- Backup/restore for tiered tables (extension to Altinity Backup for ClickHouse aka clickhouse-backup)