Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ This Demo launches Bronze and Silver pipelines with following activities:
7. ```commandline
python demo/launch_dais_demo.py --uc_catalog_name=<<uc catalog name>> --profile=<<DEFAULT>>
```
- uc_catalog_name : Unity catalog name
- uc_catalog_name : UC catalog name. Names that are valid non-delimited identifiers (ASCII letters, digits, underscores, not starting with a digit) are used as-is. Names containing other characters are automatically wrapped in backticks as delimited identifiers.
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.

![dais_demo.png](../docs/static/images/dais_demo.png)
Expand Down Expand Up @@ -86,7 +86,7 @@ This demo will launch auto generated tables(100s) inside single bronze and silve
7. ```commandline
python demo/launch_techsummit_demo.py --uc_catalog_name=<<uc catalog name>> --profile=<<DEFAULT>>
```
- uc_catalog_name : Unity catalog name
- uc_catalog_name : UC catalog name. Names that are valid non-delimited identifiers (ASCII letters, digits, underscores, not starting with a digit) are used as-is. Names containing other characters are automatically wrapped in backticks as delimited identifiers.
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token

![tech_summit_demo.png](../docs/static/images/tech_summit_demo.png)
Expand Down Expand Up @@ -128,7 +128,7 @@ This demo will perform following tasks:
7. ```commandline
python demo/launch_af_cloudfiles_demo.py --uc_catalog_name=<<uc catalog name>> --source=cloudfiles --profile=<<DEFAULT>>
```
- uc_catalog_name : Unity Catalog name
- uc_catalog_name : UC catalog name. Names that are valid non-delimited identifiers (ASCII letters, digits, underscores, not starting with a digit) are used as-is. Names containing other characters are automatically wrapped in backticks as delimited identifiers.
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token

![af_am_demo.png](../docs/static/images/af_am_demo.png)
Expand Down Expand Up @@ -178,7 +178,7 @@ This demo will perform following tasks:
- Create databricks secrets to store producer and consumer keys using the scope created in step 2

- Following are the mandatory arguments for running EventHubs demo
- uc_catalog_name : unity catalog name e.g. ravi_dlt_meta_uc
- uc_catalog_name : UC catalog name. Names that are valid non-delimited identifiers (ASCII letters, digits, underscores, not starting with a digit) are used as-is. Names containing other characters are automatically wrapped in backticks as delimited identifiers. e.g. ravi_dlt_meta_uc
- eventhub_namespace: Eventhub namespace e.g. dltmeta
- eventhub_name : Primary Eventhubname e.g. dltmeta_demo
- eventhub_name_append_flow: Secondary eventhub name for appendflow feed e.g. dltmeta_demo_af
Expand Down Expand Up @@ -232,6 +232,7 @@ This demo will perform following tasks:
python demo/launch_silver_fanout_demo.py --source=cloudfiles --uc_catalog_name=<<uc catalog name>> --profile=<<DEFAULT>>
```

- uc_catalog_name : UC catalog name. Names that are valid non-delimited identifiers (ASCII letters, digits, underscores, not starting with a digit) are used as-is. Names containing other characters are automatically wrapped in backticks as delimited identifiers.
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.

a. Databricks Workspace URL:
Expand Down Expand Up @@ -296,6 +297,9 @@ This demo will perform following tasks:
```commandline
python demo/launch_acfs_demo.py --uc_catalog_name=<<uc catalog name>> --profile=<<DEFAULT>>
```
- uc_catalog_name : UC catalog name. Names that are valid non-delimited identifiers (ASCII letters, digits, underscores, not starting with a digit) are used as-is. Names containing other characters are automatically wrapped in backticks as delimited identifiers.
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.

![acfs.png](../docs/static/images/acfs.png)

# Lakeflow Declarative Pipelines Sink Demo
Expand Down Expand Up @@ -350,6 +354,9 @@ This demo will perform following tasks:
```commandline
python demo/launch_dlt_sink_demo.py --uc_catalog_name=<<uc_catalog_name>> --source=kafka --kafka_source_topic=<<kafka source topic name>>>> --kafka_sink_topic=<<kafka sink topic name>> --kafka_source_servers_secrets_scope_name=<<kafka source servers secret name>> --kafka_source_servers_secrets_scope_key=<<kafka source server secret scope key name>> --kafka_sink_servers_secret_scope_name=<<kafka sink server secret scope key name>> --kafka_sink_servers_secret_scope_key=<<kafka sink servers secret scope key name>> --profile=<<DEFAULT>>
```
- uc_catalog_name : UC catalog name. Names that are valid non-delimited identifiers (ASCII letters, digits, underscores, not starting with a digit) are used as-is. Names containing other characters are automatically wrapped in backticks as delimited identifiers.
- you can provide `--profile=databricks_profile name` in case you already have databricks cli otherwise command prompt will ask host and token.

![dlt_demo_sink.png](../docs/static/images/dlt_demo_sink.png)
![dlt_delta_sink.png](../docs/static/images/dlt_delta_sink.png)
![dlt_kafka_sink.png](../docs/static/images/dlt_kafka_sink.png)
Expand Down Expand Up @@ -406,6 +413,8 @@ This demo will perform following steps:
```commandline
python demo/generate_dabs_resources.py --source=cloudfiles --uc_catalog_name=<your_catalog_name> --profile=<your_profile>
```
> Note: If uc_catalog_name contains characters not valid for a non-delimited identifier, it is automatically wrapped in backticks as a delimited identifier.

> Note: If you don't specify `--profile`, you'll be prompted for your Databricks workspace URL and access token.

7. Deploy and run the DAB bundle:
Expand Down
3 changes: 2 additions & 1 deletion demo/generate_dabs_resources.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,10 +48,11 @@ def init_runner_conf(self) -> DLTMetaRunnerConf:
The initialized runner configuration.
"""
run_id = uuid.uuid4().hex
uc_catalog_name = self.validate_uc_catalog_name(self.args.get("uc_catalog_name"))
runner_conf = DLTMetaRunnerConf(
run_id=run_id,
username=self.wsi._my_username,
uc_catalog_name=self.args["uc_catalog_name"],
uc_catalog_name=uc_catalog_name,
int_tests_dir="demo/dabs",
dlt_meta_schema=f"dlt_meta_dataflowspecs_demo_{run_id}",
bronze_schema=f"dlt_meta_bronze_demo_{run_id}",
Expand Down
2 changes: 1 addition & 1 deletion demo/launch_acfs_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def init_runner_conf(self) -> DLTMetaRunnerConf:
onboarding_file_path="demo/conf/onboarding.json",
env="demo"
)
runner_conf.uc_catalog_name = self.args['uc_catalog_name']
runner_conf.uc_catalog_name = self.validate_uc_catalog_name(self.args.get('uc_catalog_name'))
return runner_conf

def launch_workflow(self, runner_conf: DLTMetaRunnerConf):
Expand Down
3 changes: 2 additions & 1 deletion demo/launch_af_cloudfiles_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,11 @@ def init_runner_conf(self) -> DLTMetaRunnerConf:
The initialized runner configuration.
"""
run_id = uuid.uuid4().hex
uc_catalog_name = self.validate_uc_catalog_name(self.args.get("uc_catalog_name"))
runner_conf = DLTMetaRunnerConf(
run_id=run_id,
username=self.wsi._my_username,
uc_catalog_name=self.args["uc_catalog_name"],
uc_catalog_name=uc_catalog_name,
int_tests_dir="demo",
dlt_meta_schema=f"dlt_meta_dataflowspecs_demo_{run_id}",
bronze_schema=f"dlt_meta_bronze_demo_{run_id}",
Expand Down
2 changes: 1 addition & 1 deletion demo/launch_af_eventhub_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ def init_runner_conf(self) -> DLTMetaRunnerConf:
eventhub_namespace=self.args["eventhub_namespace"],
eventhub_port=self.args["eventhub_port"]
)
runner_conf.uc_catalog_name = self.args['uc_catalog_name']
runner_conf.uc_catalog_name = self.validate_uc_catalog_name(self.args.get('uc_catalog_name'))
runner_conf.runners_full_local_path = 'demo/notebooks/afam_eventhub_runners'
return runner_conf

Expand Down
5 changes: 3 additions & 2 deletions demo/launch_dais_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,9 @@ def init_runner_conf(self) -> DLTMetaRunnerConf:
# runners_full_local_path='./demo/dbc/dais_dlt_meta_runners.dbc',
onboarding_file_path='demo/conf/onboarding.json'
)
if self.args['uc_catalog_name']:
runner_conf.uc_catalog_name = self.args['uc_catalog_name']
uc_catalog_name = self.args.get('uc_catalog_name')
if uc_catalog_name:
runner_conf.uc_catalog_name = self.validate_uc_catalog_name(uc_catalog_name)
runner_conf.uc_volume_name = f"{runner_conf.uc_catalog_name}_dais_demo_{run_id}"

return runner_conf
Expand Down
3 changes: 2 additions & 1 deletion demo/launch_dlt_sink_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,11 @@ def init_runner_conf(self) -> DLTMetaRunnerConf:
The initialized runner configuration.
"""
run_id = uuid.uuid4().hex
uc_catalog_name = self.validate_uc_catalog_name(self.args.get("uc_catalog_name"))
runner_conf = DLTMetaRunnerConf(
run_id=run_id,
username=self.wsi._my_username,
uc_catalog_name=self.args["uc_catalog_name"],
uc_catalog_name=uc_catalog_name,
int_tests_dir="demo",
dlt_meta_schema=f"dlt_meta_dataflowspecs_demo_{run_id}",
bronze_schema=f"dlt_meta_bronze_demo_{run_id}",
Expand Down
2 changes: 1 addition & 1 deletion demo/launch_silver_fanout_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ def init_runner_conf(self) -> DLTMetaRunnerConf:
onboarding_fanout_file_path="demo/conf/onboarding_fanout_cars.json",
env="demo"
)
runner_conf.uc_catalog_name = self.args['uc_catalog_name']
runner_conf.uc_catalog_name = self.validate_uc_catalog_name(self.args.get('uc_catalog_name'))
runner_conf.uc_volume_name = f"{runner_conf.uc_catalog_name}_dlt_meta_fout_demo_{run_id}"
return runner_conf

Expand Down
7 changes: 4 additions & 3 deletions demo/launch_techsummit_demo.py
Original file line number Diff line number Diff line change
Expand Up @@ -100,9 +100,10 @@ def init_runner_conf(self) -> TechsummitRunnerConf:
and self.args.__dict__['table_data_rows_count']
else "10"),
)
if self.args['uc_catalog_name']:
runner_conf.uc_catalog_name = self.args['uc_catalog_name']
runner_conf.uc_volume_name = f"{self.args['uc_catalog_name']}_volume_{run_id}"
uc_catalog_name = self.args.get('uc_catalog_name')
if uc_catalog_name:
runner_conf.uc_catalog_name = self.validate_uc_catalog_name(uc_catalog_name)
runner_conf.uc_volume_name = f"{runner_conf.uc_catalog_name}_volume_{run_id}"
return runner_conf

def create_bronze_silver_dlt(self, runner_conf: DLTMetaRunnerConf):
Expand Down
32 changes: 32 additions & 0 deletions integration_tests/run_integration_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import argparse
import json
import os
import re
import sys
import traceback
import uuid
Expand Down Expand Up @@ -168,6 +169,37 @@ def __init__(self, args: dict[str:str], ws, base_dir):
self.wsi = WorkspaceInstaller(ws)
self.base_dir = base_dir

@staticmethod
def validate_uc_catalog_name(name):
"""Validate and normalize a Unity Catalog name.

Non-delimited identifiers can only contain ASCII letters, digits, and
underscores and must not start with a digit. Delimited identifiers
(wrapped in backticks) can use any unicode character.

If the name contains characters not valid for a non-delimited identifier,
it is automatically wrapped in backticks to form a delimited identifier.

Args:
name: The catalog name to validate.

Returns:
The validated and possibly backtick-wrapped catalog name.

Raises:
ValueError: If the name is None or empty.
"""
if name is None:
raise ValueError("'uc_catalog_name' is required but was not provided.")
if not name.strip():
raise ValueError("'uc_catalog_name' must not be empty.")
stripped = name.strip('`')
if not stripped:
raise ValueError("'uc_catalog_name' must not be empty.")
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', stripped):
return f'`{stripped}`'
return stripped

def init_runner_conf(self) -> DLTMetaRunnerConf:
"""Initialize the runner configuration for running integration tests."""
run_id = uuid.uuid4().hex
Expand Down
36 changes: 36 additions & 0 deletions src/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import logging
import json
import os
import re
import sys
import uuid
import webbrowser
Expand All @@ -18,6 +19,37 @@
logger = logging.getLogger('databricks.labs.dltmeta')


def validate_uc_catalog_name(name):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have the same implementation in the two places?

"""Validate and normalize a Unity Catalog name.

Non-delimited identifiers can only contain ASCII letters, digits, and
underscores and must not start with a digit. Delimited identifiers
(wrapped in backticks) can use any unicode character.

If the name contains characters not valid for a non-delimited identifier,
it is automatically wrapped in backticks to form a delimited identifier.

Args:
name: The catalog name to validate.

Returns:
The validated and possibly backtick-wrapped catalog name.

Raises:
ValueError: If the name is None or empty.
"""
if name is None:
raise ValueError("'uc_catalog_name' is required but was not provided.")
if not name.strip():
raise ValueError("'uc_catalog_name' must not be empty.")
stripped = name.strip('`')
if not stripped:
raise ValueError("'uc_catalog_name' must not be empty.")
if not re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', stripped):
return f'`{stripped}`'
return stripped


DLT_META_RUNNER_NOTEBOOK = """
# Databricks notebook source
# MAGIC %pip install dlt-meta=={version}
Expand Down Expand Up @@ -67,6 +99,8 @@ def __post_init__(self):
raise ValueError("onboard_layer is required")
if self.onboard_layer.lower() not in ["bronze", "silver", "bronze_silver"]:
raise ValueError("onboard_layer must be one of bronze, silver, bronze_silver")
if self.uc_enabled and self.uc_catalog_name:
self.uc_catalog_name = validate_uc_catalog_name(self.uc_catalog_name)
# if self.uc_enabled == "":
# raise ValueError("uc_enabled is required, please set to True or False")
if not self.uc_enabled and not self.dbfs_path:
Expand Down Expand Up @@ -125,6 +159,8 @@ class DeployCommand:
def __post_init__(self):
if self.uc_enabled and not self.uc_catalog_name:
raise ValueError("uc_catalog_name is required")
if self.uc_enabled and self.uc_catalog_name:
self.uc_catalog_name = validate_uc_catalog_name(self.uc_catalog_name)
if not self.serverless and not self.num_workers:
raise ValueError("num_workers is required")
if not self.layer:
Expand Down
100 changes: 99 additions & 1 deletion tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,10 @@
import json
from databricks.sdk.service.catalog import VolumeType
from src.__about__ import __version__
from src.cli import DLT_META_RUNNER_NOTEBOOK, DeployCommand, DLTMeta, OnboardCommand, main
from src.cli import (
DLT_META_RUNNER_NOTEBOOK, DeployCommand, DLTMeta, OnboardCommand,
main, validate_uc_catalog_name
)


class CliTests(unittest.TestCase):
Expand Down Expand Up @@ -1303,6 +1306,101 @@ def test_onboard_command_version_validation(self):
)
self.assertIn("version is required", str(context.exception))

def test_onboard_command_uc_catalog_name_auto_delimit(self):
"""Test OnboardCommand auto-wraps non-standard uc_catalog_name with backticks."""
delimit_cases = {
"my-catalog": "`my-catalog`",
"1starts_with_digit": "`1starts_with_digit`",
"has space": "`has space`",
"dot.name": "`dot.name`",
"sp@cial": "`sp@cial`",
}
for name, expected in delimit_cases.items():
cmd = OnboardCommand(
onboarding_file_path="tests/resources/onboarding.json",
onboarding_files_dir_path="tests/resources/",
onboard_layer="bronze",
env="dev",
import_author="John Doe",
version="1.0",
dlt_meta_schema="dlt_meta",
uc_enabled=True,
uc_catalog_name=name,
serverless=True,
bronze_dataflowspec_table="bronze_dataflowspec",
overwrite=True,
)
self.assertEqual(cmd.uc_catalog_name, expected)

# Valid non-delimited names should stay unchanged
valid_names = ["my_catalog", "Catalog1", "_private", "ABC_123"]
for name in valid_names:
cmd = OnboardCommand(
onboarding_file_path="tests/resources/onboarding.json",
onboarding_files_dir_path="tests/resources/",
onboard_layer="bronze",
env="dev",
import_author="John Doe",
version="1.0",
dlt_meta_schema="dlt_meta",
uc_enabled=True,
uc_catalog_name=name,
serverless=True,
bronze_dataflowspec_table="bronze_dataflowspec",
overwrite=True,
)
self.assertEqual(cmd.uc_catalog_name, name)

def test_deploy_command_uc_catalog_name_auto_delimit(self):
"""Test DeployCommand auto-wraps non-standard uc_catalog_name with backticks."""
delimit_cases = {
"my-catalog": "`my-catalog`",
"1starts_with_digit": "`1starts_with_digit`",
"has space": "`has space`",
"dot.name": "`dot.name`",
"sp@cial": "`sp@cial`",
}
for name, expected in delimit_cases.items():
cmd = DeployCommand(
layer="bronze",
onboard_bronze_group="A1",
dlt_meta_bronze_schema="dlt_meta",
dataflowspec_bronze_table="dataflowspec_table",
pipeline_name="test_pipeline",
dlt_target_schema="target_schema",
uc_enabled=True,
uc_catalog_name=name,
serverless=True,
)
self.assertEqual(cmd.uc_catalog_name, expected)

# Valid non-delimited names should stay unchanged
valid_names = ["my_catalog", "Catalog1", "_private", "ABC_123"]
for name in valid_names:
cmd = DeployCommand(
layer="bronze",
onboard_bronze_group="A1",
dlt_meta_bronze_schema="dlt_meta",
dataflowspec_bronze_table="dataflowspec_table",
pipeline_name="test_pipeline",
dlt_target_schema="target_schema",
uc_enabled=True,
uc_catalog_name=name,
serverless=True,
)
self.assertEqual(cmd.uc_catalog_name, name)

def test_validate_uc_catalog_name_empty_and_none(self):
"""Test validate_uc_catalog_name raises on None and empty strings."""
with self.assertRaises(ValueError):
validate_uc_catalog_name(None)
with self.assertRaises(ValueError):
validate_uc_catalog_name("")
with self.assertRaises(ValueError):
validate_uc_catalog_name(" ")
with self.assertRaises(ValueError):
validate_uc_catalog_name("``")

def test_deploy_command_validation_cases(self):
"""Test DeployCommand validation cases for missing coverage."""
# Test bronze layer without dataflowspec_bronze_table when uc_enabled=True (line 136)
Expand Down