Databricks Telecom Bundle is a Databricks Asset Bundle (DABS) that deploys all the necessary assets to showcase a Data Warehouse workload with synthetic generated telecom billing related data.
This bundle provides a complete end-to-end demonstration of a modern data platform, including:
- Synthetic data generation for telecom billing scenarios
- ETL pipelines for data ingestion and transformation (Bronze, Silver, Gold layers)
- Data warehouse tables with proper relationships for joins and aggregations
- Interactive dashboards for data visualization
- Serverless workflow orchestration for automated processing
The tables have well-defined relationships, making it easy to perform joins, aggregations, and leverage advanced features like Genie Spaces for natural language querying.
- Clone this repository directly in your Databricks workspace
- Follow the workspace tutorial: Databricks Asset Bundle Workspace Tutorial
- Deploy the bundle using the workspace interface
-
Install the Databricks CLI:
pip install databricks-cli
-
Authenticate to your Databricks workspace:
databricks configure
-
Clone this repository locally:
git clone <repository-url> cd databricks_telecom_bundle
-
Deploy the bundle:
# For development environment databricks bundle deploy --target dev # For production environment databricks bundle deploy --target prod
You can customize the deployment by overriding variables defined in databricks.yml. The main variable to focus on is:
catalog_name: The catalog where all schemas and tables will be created (default:databricks_telecom_bundle)
Other configurable variables include schema names for different layers:
schema_ingestion: Raw data ingestion schemaschema_customer_bronze/silver/gold: Customer data schemas by layerschema_billing_bronze/silver/gold: Billing data schemas by layerschema_resource_bronze/silver: Resource data schemas by layer
You can override variables using several methods:
-
Command-line variables:
databricks bundle deploy --target prod --var catalog_name=my_custom_catalog
-
Environment variables:
export BUNDLE_VAR_catalog_name=my_custom_catalog databricks bundle deploy --target prod -
Variable overrides file: Create a
.databricks/bundle/<target>/variable-overrides.jsonfile:{ "catalog_name": "my_custom_catalog", "schema_ingestion": "my_ingestion_schema", "schema_customer_gold": "my_customer_gold_schema" }
For more details on variable override methods and precedence, see the Databricks Asset Bundle variables documentation.
⏱️ Important: The bundle runs on serverless workflows and takes approximately 30 minutes to complete the full deployment and data generation process.
- After cloning and deploying the bundle, navigate to your Databricks workspace
- Go to Workflows section
- Start the workflow named
databricks_telecom_bundle - Monitor the progress through the workflow UI
- The catalog specified in your configuration should already exist in your workspace
- Ensure you have appropriate permissions to create schemas, tables, and workflows
📊 Dashboard Limitation: The dashboard will only work if you deploy in production mode (--target prod). This is because Databricks Asset Bundles don't yet support variable usage in dashboards, so the dashboard is configured with production-specific references.
To access the dashboard:
- Deploy using
--target prod - Complete the workflow execution
- Navigate to Dashboards in your workspace
- Find the dashboard named according to your catalog configuration
The bundle is organized into several main components:
src/geracao_dados/: Synthetic data generation logicsrc/batch_ingestion/: Data ingestion pipelines (Bronze and Silver layers)src/data_warehousing/: Data warehouse implementation (Gold layer, dimensions, facts)src/bundle_orchestrator/: Workflow orchestration configuration
- Medallion Architecture: Bronze, Silver, and Gold data layers
- Delta Live Tables: Real-time data processing pipelines
- Serverless Compute: Cost-effective, auto-scaling compute resources
- Data Relationships: Properly modeled telecom billing data with foreign keys
- Advanced Analytics: Ready for complex queries and aggregations
- Genie Spaces: Natural language querying capabilities
- Interactive Dashboards: Business intelligence visualization
For more information about Databricks Asset Bundles, see the official documentation.