-
Notifications
You must be signed in to change notification settings - Fork 357
Add Lakekeeper catalog support in docs #4177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@somratdutta is attempting to deploy a commit to the ClickHouse Team on Vercel. A member of the Team first needs to authorize it. |
Testing InstructionsThis PR depends on a recently merged fix that is not yet available as a Docker image. Below are comprehensive testing instructions to validate the changes locally. PrerequisitesDownload the appropriate ClickHouse binary from the builds artfiacts based on your platform. For macOS on Apple Silicon, use the Environment Setup
Data Ingestion via PyIcebergCreate the notebook !pip install -q pyiceberg
from pyiceberg.catalog.rest import RestCatalog
import logging
import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa
# Uncomment for detailed logging
# logging.basicConfig(level=logging.DEBUG)
CATALOG_URL = "http://lakekeeper:8181/catalog"
DEMO_WAREHOUSE = "demo"
catalog = RestCatalog(
name="my_catalog",
warehouse=DEMO_WAREHOUSE,
uri=CATALOG_URL,
token="dummy",
)
# Initialize namespace
test_namespace = ("pyiceberg_namespace",)
if test_namespace not in catalog.list_namespaces():
catalog.create_namespace(test_namespace)
# Prepare test dataset
test_table = ("pyiceberg_namespace", "my_table")
df = pd.DataFrame({
"id": [1, 2, 3],
"data": ["a", "b", "c"],
})
pa_df = pa.Table.from_pandas(df)
# Clean existing table if present
if test_table in catalog.list_tables(namespace=test_namespace):
catalog.drop_table(test_table)
# Create and populate table
table = catalog.create_table(
test_table,
schema=pa_df.schema,
properties={"write.metadata.compression-codec": "none"},
)
table.append(pa_df)
# Verify data ingestion
table = catalog.load_table(test_table)
print(table.scan().to_pandas()) Integration TestingAfter executing the notebook, connect to ClickHouse and validate the DataLakeCatalog integration: ./clickhouse client Execute the following SQL commands to verify functionality: -- Enable experimental Iceberg support
SET allow_experimental_database_iceberg = 1;
-- Configure DataLakeCatalog with REST catalog backend
CREATE DATABASE demo
ENGINE = DataLakeCatalog('http://localhost:8181/catalog', 'minio', 'ClickHouse_Minio_P@ssw0rd')
SETTINGS
catalog_type = 'rest',
storage_endpoint = 'http://localhost:9002/warehouse-rest',
warehouse = 'demo';
-- Verify table discovery
SHOW TABLES FROM demo;
-- Validate data retrieval
SELECT * FROM demo.`pyiceberg_namespace.my_table`; |
Summary
Checklist