Project Antalya Roadmap 2025 - Real-Time Data Lakes

Here's the current Project Antalya roadmap for 2025. This year the principal focus is adapting ClickHouse to use Iceberg as shared object storage and adding separation of storage and compute.  All features are open source--there are no hold-backs. 

Please suggest additional features and ideas in the comments to this issue. We also welcome contributions. 

## Performance:
- [x] Parquet metadata cache 
- #586 
- #636 
- [x] Parquet native reader, v1  
- https://github.com/ClickHouse/ClickHouse/pull/62966
- https://github.com/ClickHouse/ClickHouse/pull/70807
- https://github.com/ClickHouse/ClickHouse/pull/71055
- https://github.com/ClickHouse/ClickHouse/pull/71383
- https://github.com/ClickHouse/ClickHouse/pull/72105
- [ ] Parquet native reader, v3 (upstream) https://github.com/ClickHouse/ClickHouse/pull/70611
- [x] ListObjectsV2 cache #743 
- [x] Iceberg table pruning in cluster requests #770 
- [x] Iceberg files metadata cache (upstream) https://github.com/ClickHouse/ClickHouse/pull/77156
- [x] Iceberg partition pruning (upstream) https://github.com/ClickHouse/ClickHouse/pull/72044
- [x] Iceberg min/max pruning (upstream) https://github.com/ClickHouse/ClickHouse/pull/78242
- [ ] RowGroup adaptive size

## Swarms:
- [x] Auto-discovery of swarm cluster nodes #629 
- [x] Consistent hashing for object distribution to improve cache locality #709  
- [x] Distributed object storage table engines #615 
- [x] Swarm query syntax #712 
- [ ] Swarm reliability/re-tries #759 
- #780
- [ ] Swarm for writes
- [ ] Swarm for merges/optimize

## Catalogs:
- [x] Open source catalog for Kubernetes https://github.com/Altinity/ice
- [ ] AWS S3 Table support
- [x] Unity catalog support https://github.com/ClickHouse/ClickHouse/pull/76988
- [x] Glue catalog support https://github.com/ClickHouse/ClickHouse/pull/77257
- [ ] Cloudflare R2 Data Catalogs support
- [ ] Public datasets in Iceberg
- [ ] Use IAM roles to access s3 table function #688 

## Iceberg Writes:
- [x] Toolkit for loading files into Iceberg https://github.com/Altinity/ice
- [ ] Support partitioning
- [ ] Support ordering (see https://www.tabular.io/apache-iceberg-cookbook/data-engineering-table-write-order/)
- [ ] CREATE TABLE for Iceberg/DataLakeCatalog database engine
- [ ] INSERT INTO Iceberg table
- [ ] Use MergeTree buffer for frequent inserts into Iceberg (like async inserts but with much bigger buffer on disk)

## Tiered Storage:
- [ ] Wildcard support for object storage #789 
- [ ] Add support for hive partition style reads and writes https://github.com/ClickHouse/ClickHouse/pull/76802
- [ ] Write MergeTree parts to Parquet
- [ ] Tiered table engine #815 
- #683 
- [ ] TTL to other table
- [ ] Merge tables with watermark
- [ ] Backup/restore for tiered tables (extension to Altinity Backup for ClickHouse aka clickhouse-backup)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Project Antalya Roadmap 2025 - Real-Time Data Lakes #804

Performance:

Swarms:

Catalogs:

Iceberg Writes:

Tiered Storage:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Project Antalya Roadmap 2025 - Real-Time Data Lakes #804

Description

Performance:

Swarms:

Catalogs:

Iceberg Writes:

Tiered Storage:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions