diff --git a/.test/README.md b/.test/README.md index f9ae7c9c..3329f2fc 100644 --- a/.test/README.md +++ b/.test/README.md @@ -149,7 +149,7 @@ uv run python .test/scripts/optimize.py databricks-metric-views --include-tools When `--tool-modules` is specified, both tool stats and the cross-skill dataset are filtered: - **Tool stats** report only the requested modules (e.g., `Tool modules: 1, tools: 5` for `--tool-modules sql`). -- **Cross-skill dataset** includes only skills whose `tool_modules` in `manifest.yaml` overlap with the requested modules. Skills that *don't declare* `tool_modules` are always included as a safe fallback (e.g., `databricks-config`, `databricks-docs`). This means the dataset won't shrink to *only* SQL skills โ€” general-purpose skills without the field are kept so the evaluator still has broad coverage. +- **Cross-skill dataset** includes only skills whose `tool_modules` in `manifest.yaml` overlap with the requested modules. Skills that *don't declare* `tool_modules` are always included as a safe fallback (e.g., `databricks-docs`, `dev-best-practices`). This means the dataset won't shrink to *only* SQL skills โ€” general-purpose skills without the field are kept so the evaluator still has broad coverage. To reduce the dataset further, add `tool_modules` to any remaining skills that should be excluded for certain module filters. Without `--tool-modules`, all skills are included regardless (no regression). diff --git a/databricks-builder-app/app.yaml b/databricks-builder-app/app.yaml index 1349cb8c..7c5460f2 100644 --- a/databricks-builder-app/app.yaml +++ b/databricks-builder-app/app.yaml @@ -30,7 +30,7 @@ env: # ============================================================================= # Comma-separated list of skills to enable - name: ENABLED_SKILLS - value: "databricks-asset-bundles,databricks-agent-bricks,databricks-aibi-dashboards,databricks-app-apx,databricks-app-python,databricks-config,databricks-docs,databricks-jobs,databricks-python-sdk,databricks-unity-catalog,mlflow-evaluation,spark-declarative-pipelines,synthetic-data-generation,unstructured-pdf-generation" + value: "databricks-asset-bundles,databricks-agent-bricks,databricks-aibi-dashboards,databricks-app-apx,databricks-app-python,databricks-docs,dev-best-practices,databricks-jobs,databricks-python-sdk,databricks-unity-catalog,mlflow-evaluation,spark-declarative-pipelines,synthetic-data-generation,unstructured-pdf-generation" - name: SKILLS_ONLY_MODE value: "false" diff --git a/databricks-skills/README.md b/databricks-skills/README.md index 29a79ae8..1226468e 100644 --- a/databricks-skills/README.md +++ b/databricks-skills/README.md @@ -64,8 +64,9 @@ cp -r ai-dev-kit/databricks-skills/databricks-agent-bricks .claude/skills/ - **databricks-asset-bundles** - DABs for multi-environment deployments - **databricks-app-apx** - Full-stack apps (FastAPI + React) - **databricks-app-python** - Python web apps (Dash, Streamlit, Flask) +- **databricks-config** - CLI authentication and profile setup (manually invoke for auth-only workflows) +- **dev-best-practices** - Databricks development best practices: Git workflow, code quality, architecture, CI/CD, and production handoff - **databricks-python-sdk** - Python SDK, Connect, CLI, REST API -- **databricks-config** - Profile authentication setup - **databricks-lakebase-provisioned** - Managed PostgreSQL for OLTP workloads ### ๐Ÿ“š Reference diff --git a/databricks-skills/databricks-asset-bundles/SKILL.md b/databricks-skills/databricks-asset-bundles/SKILL.md index 4253e8e9..c7981f09 100644 --- a/databricks-skills/databricks-asset-bundles/SKILL.md +++ b/databricks-skills/databricks-asset-bundles/SKILL.md @@ -312,7 +312,7 @@ databricks bundle destroy -t prod --auto-approve - **[databricks-spark-declarative-pipelines](../databricks-spark-declarative-pipelines/SKILL.md)** - pipeline definitions referenced by DABs - **[databricks-app-apx](../databricks-app-apx/SKILL.md)** - app deployment via DABs - **[databricks-app-python](../databricks-app-python/SKILL.md)** - Python app deployment via DABs -- **[databricks-config](../databricks-config/SKILL.md)** - profile and authentication setup for CLI/SDK + - **[databricks-jobs](../databricks-jobs/SKILL.md)** - job orchestration managed through bundles ## Resources diff --git a/databricks-skills/databricks-config/SKILL.md b/databricks-skills/databricks-config/SKILL.md index 88382c33..8beaaeb7 100644 --- a/databricks-skills/databricks-config/SKILL.md +++ b/databricks-skills/databricks-config/SKILL.md @@ -1,22 +1,67 @@ --- name: databricks-config -description: "Manage Databricks workspace connections: check which workspace you're connected to, switch workspaces, list available workspaces, or authenticate to a new workspace." +description: "Set up Databricks CLI authentication and profiles. Use when you need to authenticate to a Databricks workspace, manage named profiles, or verify your current connection." --- -Use the `manage_workspace` MCP tool for all workspace operations. Do NOT edit `~/.databrickscfg`, use Bash, or use the Databricks CLI. +# Databricks CLI Auth & Profile Setup -## Steps +Use this skill when you need to authenticate to a Databricks workspace or configure named profiles for multi-workspace workflows. For a full dev environment setup (IDE, SDK, project structure), see [dev-best-practices ยง2](../dev-best-practices/1-foundations-and-setup.md). -1. Call `ToolSearch` with query `select:mcp__databricks__manage_workspace` to load the tool. +## Authenticate to a workspace -2. Map user intent to action: - - status / which workspace / current โ†’ `action="status"` - - list / available workspaces โ†’ `action="list"` - - switch to X โ†’ call `list` first to find the profile name, then `action="switch", profile=""` (or `host=""` if a URL was given) - - login / connect / authenticate โ†’ `action="login", host=""` +For initial CLI setup and install instructions, see [dev-best-practices ยง2.5](../dev-best-practices/1-foundations-and-setup.md). -3. Call `mcp__databricks__manage_workspace` with the action and any parameters. +```bash +# Store under a named profile (for multi-workspace workflows) +databricks auth login --host https://.azuredatabricks.net --profile my-profile +``` -4. Present the result. For `status`/`switch`/`login`: show host, profile, username. For `list`: formatted table with the active profile marked. +## Multiple workspaces with named profiles -> **Note:** The switch is session-scoped โ€” it resets on MCP server restart. For permanent profile setup, use `databricks auth login -p ` and update `~/.databrickscfg` with `cluster_id` or `serverless_compute_id = auto`. +```bash +# Set up separate profiles per environment +databricks auth login --host https://dev.databricks.com --profile dev +databricks auth login --host https://prod.databricks.com --profile prod + +# Use a profile for CLI commands +databricks jobs list --profile prod +``` + +## Verify your connection + +```bash +# Check current user and workspace +databricks current-user me + +# Check a specific profile +databricks current-user me --profile dev +``` + +## View and edit profiles + +```bash +# Show all configured profiles +databricks auth profiles + +# Config file location +cat ~/.databrickscfg +``` + +## Set a default profile + +Add `DATABRICKS_CONFIG_PROFILE=` to your shell profile (`.zshrc`, `.bashrc`) or export it in your session: + +```bash +export DATABRICKS_CONFIG_PROFILE=dev +``` + +## Troubleshooting + +- **Token expired:** Re-run `databricks auth login` for the relevant profile. +- **Wrong workspace:** Check `databricks current-user me` โ€” confirm the host matches. +- **SDK not picking up profile:** Set `DATABRICKS_CONFIG_PROFILE` or pass `profile` explicitly in code. + +```python +from databricks.sdk import WorkspaceClient +w = WorkspaceClient(profile="dev") +``` diff --git a/databricks-skills/databricks-python-sdk/SKILL.md b/databricks-skills/databricks-python-sdk/SKILL.md index 1365666a..3f2a2bc1 100644 --- a/databricks-skills/databricks-python-sdk/SKILL.md +++ b/databricks-skills/databricks-python-sdk/SKILL.md @@ -616,7 +616,7 @@ If I'm unsure about a method, I should: ## Related Skills -- **[databricks-config](../databricks-config/SKILL.md)** - profile and authentication setup +- **[dev-best-practices](../dev-best-practices/SKILL.md)** - Databricks development best practices (dev environment, Git workflow, architecture, CI/CD) - **[databricks-asset-bundles](../databricks-asset-bundles/SKILL.md)** - deploying resources via DABs - **[databricks-jobs](../databricks-jobs/SKILL.md)** - job orchestration patterns - **[databricks-unity-catalog](../databricks-unity-catalog/SKILL.md)** - catalog governance diff --git a/databricks-skills/databricks-zerobus-ingest/SKILL.md b/databricks-skills/databricks-zerobus-ingest/SKILL.md index 2b8ce214..5ff37a19 100644 --- a/databricks-skills/databricks-zerobus-ingest/SKILL.md +++ b/databricks-skills/databricks-zerobus-ingest/SKILL.md @@ -224,7 +224,7 @@ The timestamp generation must use microseconds for Databricks. - **[databricks-spark-declarative-pipelines](../databricks-spark-declarative-pipelines/SKILL.md)** - Downstream pipeline processing of ingested data - **[databricks-unity-catalog](../databricks-unity-catalog/SKILL.md)** - Managing catalogs, schemas, and tables that Zerobus writes to - **[databricks-synthetic-data-gen](../databricks-synthetic-data-gen/SKILL.md)** - Generate test data to feed into Zerobus producers -- **[databricks-config](../databricks-config/SKILL.md)** - Profile and authentication setup +- **[dev-best-practices](../dev-best-practices/SKILL.md)** - CLI and authentication setup (ยง2.5), development best practices ## Resources diff --git a/databricks-skills/dev-best-practices/1-foundations-and-setup.md b/databricks-skills/dev-best-practices/1-foundations-and-setup.md new file mode 100644 index 00000000..1ab6bb7f --- /dev/null +++ b/databricks-skills/dev-best-practices/1-foundations-and-setup.md @@ -0,0 +1,411 @@ +# Part I: Foundations & Setup + Part II: Development Workflow + +## 2. Development Environment Setup + +### 2.2 Python Environment (uv) + +Use [uv](https://github.com/astral-sh/uv) for Python dependency management โ€” fast, Rust-based, uses standard `pyproject.toml`. + +```bash +# install uv +curl -LsSf https://astral.sh/uv/install.sh | sh + +# install Python matching Databricks Runtime +pyenv install 3.12.3 +pyenv local 3.12.3 + +# install project dependencies +uv sync --all-groups + +# add new dependency +uv add mlflow + +# add dev dependency +uv add --dev pytest +``` + +**Example `pyproject.toml`:** + +```toml +[project] +name = "customer-project" +version = "0.1.0" +requires-python = ">=3.12,<3.13" +dependencies = [ + "databricks-sdk>=0.52.0", + "mlflow[databricks]>=3.1", + "pydantic>=2.10.0", +] + +[dependency-groups] +dev = [ + "pytest>=7.4.0", + "ruff>=0.11.9", + "pre-commit>=3.7.1", +] +``` + +**Generating `requirements.txt` for Databricks notebooks:** + +Only include direct dependencies โ€” NOT transitive ones. The Databricks Runtime already includes pyspark, pandas, numpy, etc. + +``` +databricks-sdk[openai]>=0.69.0 +databricks-vectorsearch>=0.63 +mlflow==3.7.0 +beautifulsoup4>=4.12.0 +``` + +> **Note:** `uv export` outputs all transitive dependencies โ€” don't use it for `requirements.txt`. Extract only `[project.dependencies]` from `pyproject.toml`. + +### 2.3 Code Quality Tools (Ruff) + +Use [Ruff](https://github.com/astral-sh/ruff) for linting and formatting โ€” replaces Black, flake8, isort. + +**`.pre-commit-config.yaml`:** + +```yaml +repos: + - repo: https://github.com/astral-sh/ruff-pre-commit + rev: v0.14.11 + hooks: + - id: ruff + args: [--fix] + - id: ruff-format + + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v6.0.0 + hooks: + - id: trailing-whitespace + - id: end-of-file-fixer + - id: check-yaml + - id: check-added-large-files + - id: detect-private-key +``` + +```bash +# install hooks +uv run pre-commit install + +# run manually +uv run ruff check . --fix +uv run ruff format . +``` + +### 2.4 IDE Setup (VSCode / Cursor) + +See [Databricks documentation](https://docs.databricks.com/) for IDE setup and extension guidance. + +### 2.5 Databricks CLI Setup + +The Databricks CLI is a required dependency โ€” install it for your platform via the [official instructions](https://docs.databricks.com/en/dev-tools/cli/install.html). + +```bash +# authenticate (OAuth) +databricks auth login --host https://workspace.cloud.databricks.com + +# verify +databricks current-user me + +# multiple workspaces +databricks auth login --host https://dev.databricks.com --profile dev +databricks auth login --host https://prod.databricks.com --profile prod +``` + +--- + +## 3. Git & Collaboration + +Adopt the project's existing Git workflows when they exist. When building from scratch, use the following practices. + +### 3.1 Git Fundamentals + +- Main branch is protected (requires PR to merge) +- Main is always deployable +- Use `.gitignore` to exclude generated files, secrets, and IDE artifacts + +**Never Commit Secrets:** +- No API keys, tokens, or passwords in code +- Use Databricks Secrets for credentials +- Add `detect-private-key` to pre-commit hooks +- If you accidentally commit a secret, **rotate it immediately** (git history preserves it) + +### 3.2 Branch Strategy + +**Branch Naming: `{username}/{description}`** + +```bash +# Good - clear ownership +{username}/add-billing-retry-logic +{username}/fix-vector-search-timeout +{username}/update-agent-config + +# Bad - who owns this? +feature/billing +fix/timeout-issue +``` + +**Branch Lifecycle:** + +```bash +# 1. Start fresh from main +git checkout main && git pull origin main + +# 2. Create your branch +git checkout -b {username}/add-billing-tool + +# 3. Stay synced (daily) +git fetch origin && git rebase origin/main + +# 4. Push and create PR +git push origin {username}/add-billing-tool +gh pr create --title "[feature] Add billing tool" + +# 5. After merge, delete the branch +git checkout main && git pull origin main +git branch -d {username}/add-billing-tool +git push origin --delete {username}/add-billing-tool +``` + +**Key Principles:** Short-lived branches (days, not weeks), rebase daily, delete after merge, never push directly to main. + +### 3.3 Commit Best Practices + +**Atomic commits** โ€” one logical change per commit: + +```bash +git commit -m "add retry logic to billing API client" +git commit -m "add unit tests for billing retry logic" +``` + +**Imperative mood** in messages: + +``` +# GOOD +add customer lookup tool +fix timeout in vector search + +# BAD +added customer lookup tool +fixing timeout +``` + +**Message structure:** + +``` +Short summary (50 chars or less) + +Longer description if needed. Explain WHY, not just WHAT. + +- Bullet points for multiple related points +- Reference ticket numbers if applicable +``` + +Commit at every logical stopping point: feature complete, tests passing, about to try something risky. + +### 3.4 Pull Requests + +**Size guidelines:** + +| PR Size | Lines Changed | Recommendation | +|---------|--------------|----------------| +| Small | <200 | Ideal | +| Medium | 200โ€“400 | Good | +| Large | 400โ€“800 | Split if possible | +| Too Large | >800 | Must split | + +**PR Titles โ€” use prefixes:** + +``` +[feature] Add billing tool +[fix] Resolve vector search timeout +[refactor] Simplify retry logic +[docs] Update onboarding guide +[chore] Update dependencies +``` + +**PR Description template:** + +```markdown +## What +Quick summary of the change(s) + +## Why +Why the change is required + +## Key Changes +- What changed +- What else changed + +## Testing +- How you tested it (e.g., runs successfully in notebook, dev model deployed) +- Link to issue tracker ticket (e.g. Jira, GitHub Issues) if applicable +``` + +**Self-review before requesting review:** +1. Read through your own diff (use AI tools to help) +2. Check for debug code, TODOs, commented code +3. Ensure tests pass +4. Verify description is clear +5. Link related tickets + +### 3.5 Draft PRs vs Ready PRs + +Use **Draft PR** for: early feedback, WIP with team visibility, complex changes needing incremental review. + +```bash +gh pr create --draft --title "WIP: Add billing retry logic" +gh pr ready # convert to ready +``` + +Use **Ready PR** when: code is complete and tested, all checklist items done, self-reviewed. + +### 3.6 Code Review Culture + +**Reviewer:** Review within 4 hours. Distinguish blockers from suggestions. + +**Author:** Respond to every comment. Push back respectfully if you disagree. + +**Comment prefixes:** + +``` +nit: Consider renaming this variable for clarity (non-blocking) +suggestion: Could use a list comprehension here (non-blocking) +question: Why did we choose this approach over X? (non-blocking) +blocker: This will cause a null pointer exception (must fix) +``` + +### 3.7 Git Hygiene + +- Squash commits before merge (use "Squash and merge" in GitHub) +- Delete merged branches immediately +- Never rewrite shared history (`git push --force origin main` โ€” DANGEROUS) +- Tag releases with semantic versioning: `git tag -a v1.2.0 -m "Release 1.2.0"` + +**Typical release flow:** + +Development: +1. PR โ†’ automated CI (unit tests, lint, secrets check) +2. Code review and merge to main +3. Merge triggers CI/CD โ†’ deploys to staging workspace + +Production: +1. Create semantic version release tag (e.g., `v0.1.0`) on `main` via GitHub UI +2. Tag creation triggers CI/CD โ†’ validates and deploys to PROD + +```bash +# Clean up merged branches +git branch --merged main | grep -v main | xargs git branch -d +git fetch --prune +``` + +--- + +## 5. Daily Development Cycle + +### 5.1 Start of Day + +```bash +git checkout main && git pull origin main +git checkout -b {username}/my-feature # or checkout existing + rebase +git rebase origin/main +``` + +Rebase daily โ€” small frequent rebases are easier than resolving large conflicts. + +### 5.2 Development Loop + +```bash +# run tests frequently +uv run pytest tests/unit/ -x + +# commit at logical stopping points +git add . && git commit -m "add customer lookup tool" + +# push regularly for backup and visibility +git push origin {username}/my-feature +``` + +### 5.3 Testing in Databricks + +**Option 1: DAB Deployment** (for jobs and pipelines) + +```bash +databricks bundle validate -t dev +databricks bundle deploy -t dev +databricks bundle run job_name -t dev +``` + +DAB dev mode auto-prefixes resources with `[dev your.name]` to prevent conflicts. Note: written artifacts (Delta tables, MLflow models) are shared โ€” use dev-prefixed catalogs. + +**Option 2: Git-Connected Notebooks** (for interactive testing) + +1. Push your branch: `git push origin {username}/my-feature` +2. In Databricks, open a Git-connected notebook +3. Switch branches and pull via the Git dialog +4. Import and test interactively + +Preferred for AI/agent development โ€” enables MLflow Trace inspection. + +**Option 3: VS Code Extension with Databricks Connect** + +Run notebooks/Python files from VS Code against a remote cluster. Limitation: can't inspect MLflow Traces โ€” use Option 2 for AI use cases. + +### 5.4 Opening and Merging PRs + +```bash +gh pr create --title "[feature] Add customer lookup tool" + +# After merge, clean up immediately +git checkout main && git pull origin main +git branch -d {username}/my-feature +databricks bundle destroy -t dev # clean up dev resources +``` + +### 5.5 End of Day + +- **Terminate clusters** โ€” don't leave them running overnight +- **Push your work** โ€” even WIP; protects against laptop loss +- **Update the team** โ€” via your preferred channel (Slack, Teams, etc.): "finished X, starting Y tomorrow, blocked on Z" + +### 5.7 Cost Consciousness + +Databricks compute costs accrue whenever clusters are running โ€” treat compute as a shared resource. + +**Clusters:** +- Terminate clusters when not actively using them โ€” never leave them running overnight +- Configure auto-termination: 30โ€“60 minutes for interactive clusters, shorter for job clusters +- Right-size your cluster: start with the smallest instance that meets your needs, scale up only when needed +- Use single-node clusters for development and testing when distributed compute isn't required +- Use Databricks Connect from VS Code/Cursor to avoid launching a full cluster for lightweight tasks + +**Tables & Storage:** +- Delete test/scratch tables when done with a task +- Use `dev_{username}_*` naming for all dev tables (e.g., `dev_{username}_customer_data`) +- Never write to prod catalogs during development +- Clean up intermediate checkpoint files from streaming jobs + +**MLflow:** +- Delete failed or abandoned experiment runs +- Clean up unused model versions in the registry +- Don't log unnecessary artifacts (large DataFrames, full datasets) + +**DABs & Resources:** +- Destroy dev bundle resources when a feature branch is complete +- Don't leave deployed dev jobs running on schedules + +```sql +-- find tables you created in dev +SHOW TABLES IN dev_catalog.dev_schema LIKE '*{username}*'; + +-- drop dev tables when done +DROP TABLE IF EXISTS dev_catalog.dev_schema.dev_{username}_scratch; +``` + +```bash +# destroy DAB dev resources when done with a feature +databricks bundle destroy -t dev + +# check for running clusters you own +databricks clusters list --output json | jq '.[] | select(.state == "RUNNING") | {id: .cluster_id, name: .cluster_name}' +``` diff --git a/databricks-skills/dev-best-practices/2-code-quality.md b/databricks-skills/dev-best-practices/2-code-quality.md new file mode 100644 index 00000000..ead4a5aa --- /dev/null +++ b/databricks-skills/dev-best-practices/2-code-quality.md @@ -0,0 +1,466 @@ +# Part II: Code Quality Standards + +Code quality isn't about perfection โ€” it's about writing code that's easy to understand, maintain, and hand off. The team that inherits the code inherits every shortcut as technical debt. + +If the project has existing style guides, adopt them. Consistency with the existing codebase matters more than personal preferences. + +## 4.1 Python Standards + +### PEP 8 Compliance (enforced via Ruff) + +- Maximum line length: 120 characters +- 4 spaces for indentation (no tabs) +- Two blank lines between top-level definitions + +**`pyproject.toml` Ruff config:** + +```toml +[tool.ruff] +line-length = 120 +indent-width = 4 + +[tool.ruff.lint] +select = [ + "E", # pycodestyle errors + "W", # pycodestyle warnings + "F", # pyflakes + "I", # isort + "UP", # pyupgrade +] + +[tool.ruff.format] +quote-style = "double" +indent-style = "space" +``` + +### Import Organization + +```python +# 1. Standard library +import json +from pathlib import Path + +# 2. Third-party packages +import mlflow +from pydantic import BaseModel + +# 3. Local modules +from project.config import settings +from project.utils import helpers +``` + +Ruff handles this automatically with `ruff check --fix`. + +### Type Hints + +Use type hints for function signatures โ€” they serve as documentation and catch bugs early. + +```python +# Good +def process_customer( + customer_id: str, + options: dict[str, Any] | None = None +) -> CustomerResult: + ... + +def get_customers(ids: list[str]) -> dict[str, Customer]: + ... +``` + +**When to use:** +- Always: function parameters and return types +- Usually: class attributes +- Optional: local variables (only if it aids clarity) + +Don't over-annotate obvious code. + +## 4.2 Documentation + +### Docstrings + +Use [Google-style docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) for public functions and classes. + +```python +def search_documents(query: str, limit: int = 10) -> list[Document]: + """Search documents matching the query. + + Args: + query: Natural language search query + limit: Maximum results to return + + Returns: + List of matching documents ordered by relevance + + Raises: + ValueError: If query is empty + SearchError: If search backend unavailable + """ +``` + +**When to write docstrings:** +- Public functions and classes: always +- Private helpers (`_function`): only if complex +- Obvious one-liners: skip it + +### Inline Comments + +Comments should explain *why*, not *what*. The code shows what's happening. + +```python +# Bad - describes what the code does (obvious) +# Loop through customers and check status +for customer in customers: + if customer.status == "active": + ... + +# Good - explains why +# Filter to active customers only โ€” churned customers +# have incomplete data that breaks downstream processing +for customer in customers: + if customer.status == "active": + ... +``` + +> **NOTE:** AI-generated code tends to be overly liberal with inline comments โ€” avoid this. + +## 4.3 Naming Conventions + +| Resource | Convention | Example | +|----------|-----------|---------| +| Functions | snake_case | `get_customer_data` | +| Classes | PascalCase | `CustomerAgent` | +| Constants | UPPER_SNAKE | `MAX_RETRIES` | +| Private | leading underscore | `_internal_helper` | +| UC resources | snake_case | `dev_sales.customers` | +| Model endpoints | kebab-case | `customer-support-agent` | +| DAB jobs | snake_case | `deploy_agent_model` | +| Git branches | username/description | `{username}/add-billing-tool` | + +**Tips:** +- Functions: use verbs (`get_`, `process_`, `validate_`) +- Booleans: use `is_`, `has_`, `can_` prefixes +- Collections: use plurals (`customers`, `order_items`) +- Avoid abbreviations unless universally understood (`id`, `url` are fine; `cust`, `proc` are not) + +## 4.4 Error Handling + +### Be Explicit โ€” catch specific exceptions + +```python +# BAD - catches everything, hides bugs +try: + result = process_data(data) +except: + return None + +# GOOD - specific exceptions +try: + result = process_data(data) +except ValidationError as e: + logger.warning(f"Invalid data: {e}") + return None +except ConnectionError: + raise # Let infrastructure errors bubble up +``` + +### Custom Exceptions for domain-specific errors + +```python +class DataPipelineError(Exception): + """Base for pipeline errors.""" + +class ValidationError(DataPipelineError): + """Data validation failed.""" + +class TransformationError(DataPipelineError): + """Data transformation failed.""" +``` + +### Helpful Error Messages โ€” include context + +```python +# BAD +raise ValueError("Invalid input") + +# GOOD +raise ValueError( + f"Customer ID '{customer_id}' not found in database. " + f"Verify the ID exists in the customers table." +) +``` + +**Rule:** Catch expected failures you can handle (validation, network timeouts). Let unexpected errors bubble up. + +## 4.5 Code Simplicity + +**Avoid over-engineering.** Write the simplest code that solves the problem. + +```python +# Over-engineered +class CustomerProcessorFactory: + def create_processor(self, type: str) -> BaseProcessor: ... + +# Simple +def process_customer(customer: Customer) -> Result: ... +``` + +**Signs you're over-engineering:** +- Building "frameworks" instead of solving the problem +- Adding configuration for things that won't change +- Creating abstractions with only one implementation +- Premature optimization + +> **Watch out for AI-generated code!** LLMs often over-engineer โ€” excessive try/except, unnecessary fallbacks, verbose abstractions. They may also use outdated Databricks patterns. Always review and simplify. + +**Refactor when:** you're touching the code anyway and it's hard to understand, the same bug keeps appearing, you need to add a feature and the current structure makes it hard. + +**Don't refactor:** code you're not actively working on, just because you'd write it differently, without tests to verify behavior is preserved. + +## 4.6 Logging + +### Use structured logging + +```python +import logging + +logger = logging.getLogger(__name__) + +# Good - structured, searchable +logger.info( + "Processing customer batch", + extra={"batch_size": len(customers), "customer_ids": customer_ids[:5]} +) + +# Bad - unstructured, hard to parse +logger.info(f"Processing {len(customers)} customers: {customer_ids}") +``` + +### Log levels + +| Level | Use When | +|-------|---------| +| `DEBUG` | Detailed diagnostic info (disabled in prod) | +| `INFO` | Normal operations worth recording | +| `WARNING` | Something unexpected but handled | +| `ERROR` | Something failed, needs attention | + +**What to log:** Request/response boundaries, decisions made, external calls, errors with context. + +**What NOT to log:** Sensitive data (PII, credentials), high-frequency loops, redundant info already in stack traces. + +### Databricks logging setup + +```python +import logging + +def setup_logging(): + """Set up logging and suppress noisy py4j messages.""" + logging.basicConfig( + level=logging.INFO, + format="%(asctime)s - %(levelname)s - %(message)s" + ) + logging.getLogger("py4j").setLevel(logging.CRITICAL) + +def get_logger(name: str) -> logging.Logger: + """Get a logger with the specified name (typically __name__).""" + return logging.getLogger(name) +``` + +## 4.7 Testing + +### 4.7.1 Testing Philosophy + +#### Core Principles + +**1. Shift Left (Fast Feedback Loops)** + +- **Unit Tests:** Catch logic errors quickly in a lightweight environment (laptop, GitHub Actions) +- **Integration Tests:** Catch configuration, scale, and environment errors before production + +**2. How to test code, data, and models** + +- **Code:** Unit and integration tests that simulate production conditions +- **Data:** Monitoring/validation at runtime; test transformation logic and guard against invalid values +- **Models:** Focus on inference behavior โ€” output structure, value constraints, accuracy metrics against static datasets +- **Agents:** Hybrid approach โ€” deterministic plumbing (unit testable) + stochastic LLM behavior (eval only) + +**3. Tests as Living Documentation** + +Update tests *in the same commit* as code changes. A PR is incomplete if it changes logic without updating the corresponding test. + +**4. Confidence over Coverage** + +Don't chase 100% coverage. Focus on complexity and risk. +- **Test:** Complex business rules, regex parsing, mathematical calculations +- **Skip:** Trivial getters/setters, standard Spark API calls, boilerplate + +#### "Do I Have Enough Tests?" Heuristics + +**The "Friday Deploy" Test** +*If you had to deploy at 4:00 PM on a Friday and leave for the weekend, would you trust your test suite?* If "No, I'd need to manually check X" โ€” you're missing a test for X. + +**The "New Hire" Test** +*If a new team member accidentally deletes a critical line of business logic, will a test fail immediately?* If "Maybe not" โ€” your tests are too shallow. + +**The Critical Path Coverage Rule** +*Have I tested: Happy Path + 1 Edge Case + 1 Error State?* +- Happy Path: Logic works with perfect data +- Edge Case: Empty lists, null values, zero divisors +- Error State: Correct error message on invalid input + +### 4.7.2 Unit Testing + +**What to test:** +- Business transformation logic (expected, edge, error states) +- Model wrapper logic (pre/post-processing) โ€” mock the ML framework +- API contracts and interfaces + +**What NOT to test:** +- Production data distribution (that's data quality, not unit tests) +- Trivial Spark operations vendor-verified to work +- Loading heavy model artifacts + +**Best practices:** +- Write tests against pure Python functions โ€” avoid `SparkSession` dependency +- Use `unittest.mock` to simulate APIs and ML frameworks +- Use fixtures to share in-memory objects across tests + +**Testing AI Agents โ€” distinguish deterministic plumbing from probabilistic brain:** + +| Component | Test Type | Strategy | +|-----------|-----------|---------| +| **Tools** (Functions) | Unit Test | Test as standard Python functions | +| **Orchestration** (Logic) | Unit Test | Mock the LLM; verify agent executes correct tool | +| **Prompt Engineering** | Unit Test | Test string formatting and template injection | +| **LLM Response** | Eval (Not Test) | Model quality โ€” use LLM eval tools, not pytest | + +```python +import pytest +from project.transformations import calculate_customer_score + +def test_returns_score_for_valid_customer(): + customer = {"id": "123", "purchase_count": 10, "tenure_months": 24} + score = calculate_customer_score(customer) + assert 0 <= score <= 100 + +def test_raises_for_missing_required_fields(): + customer = {"id": "123"} # missing fields + with pytest.raises(ValueError, match="Missing required field"): + calculate_customer_score(customer) + +def test_handles_zero_purchases(): + customer = {"id": "123", "purchase_count": 0, "tenure_months": 12} + assert calculate_customer_score(customer) == 0 +``` + +```python +# Mocking external dependencies +from unittest.mock import Mock, patch + +def test_api_call_retries_on_failure(): + mock_response = Mock() + mock_response.json.return_value = {"status": "success"} + + with patch("project.api.requests.post", return_value=mock_response) as mock_post: + result = call_external_api("https://api.example.com", {"data": "test"}) + + assert result["status"] == "success" + mock_post.assert_called_once() + +# Agent orchestration test โ€” mock the LLM, test the wiring +def test_agent_executes_refund_tool_when_instructed(): + mock_llm_response = Mock() + mock_llm_response.tool_calls = [{"name": "refund_tool", "args": {"id": "123"}}] + + with patch("project.agent.llm_client.chat", return_value=mock_llm_response): + agent = Agent() + result = agent.run("I want a refund") + + assert result["status"] == "refund_processed" +``` + +### 4.7.3 Integration Testing + +**What to test:** +- Pipeline execution end-to-end (exit code 0) +- Component handoffs: data written by Step A is correctly readable by Step B +- Environment interaction: access to Catalogs, Schemas, Secrets, Volumes +- Side effects: tables created/updated, files written to expected locations + +**What NOT to test:** +- Exhaustive logic permutations (belongs in unit tests) +- Full data volume โ€” use small representative samples +- Spark engine correctness + +**Best practices:** +- **Isolation:** Use `integration_tests` catalog/schema โ€” never write to production during tests +- **Idempotency:** Tests must be re-runnable with setup and teardown +- **Synthetic data:** Create small, static DataFrames โ€” don't read from "live" tables +- **Service Principals:** Never run automated integration tests as a user + +**File structure:** + +``` +notebooks/ +โ””โ”€โ”€ integration_tests/ + โ”œโ”€โ”€ test_customer_pipeline.py + โ””โ”€โ”€ test_agent_responses.py +``` + +**Example integration test notebook:** + +```python +# Databricks notebook source +# COMMAND ---------- +# Setup +test_catalog = "dev_catalog" +test_schema = "integration_tests" + +test_data = spark.createDataFrame([ + {"customer_id": "1", "name": "Alice", "status": "active"}, + {"customer_id": "2", "name": "Bob", "status": "inactive"}, +]) +test_data.write.mode("overwrite").saveAsTable(f"{test_catalog}.{test_schema}.test_customers") + +# COMMAND ---------- +from project.pipelines import customer_pipeline +customer_pipeline.run(spark, test_catalog, test_schema) + +# COMMAND ---------- +# Verify outputs +result = spark.table(f"{test_catalog}.{test_schema}.customer_summary") +assert result.count() == 2, f"Expected 2 rows, got {result.count()}" +assert result.filter("status = 'active'").count() == 1 + +# COMMAND ---------- +# Teardown +spark.sql(f"DROP TABLE IF EXISTS {test_catalog}.{test_schema}.test_customers") +spark.sql(f"DROP TABLE IF EXISTS {test_catalog}.{test_schema}.customer_summary") +print("All integration tests passed") +``` + +**Running tests:** + +```bash +# unit tests (local, fast) +uv run pytest tests/unit/ -v + +# integration tests (via DAB in staging) +databricks bundle deploy -t staging +databricks bundle run integration_tests -t staging +databricks bundle destroy # clean up test resources +``` + +**Define integration test job in DAB:** + +```yaml +# resources/tests/integration_tests.yml +resources: + jobs: + integration_tests: + name: "Integration Tests" + tasks: + - task_key: test_customer_pipeline + notebook_task: + notebook_path: ./notebooks/integration-tests/test_customer_pipeline.py +``` diff --git a/databricks-skills/dev-best-practices/3-architecture.md b/databricks-skills/dev-best-practices/3-architecture.md new file mode 100644 index 00000000..475a6a22 --- /dev/null +++ b/databricks-skills/dev-best-practices/3-architecture.md @@ -0,0 +1,540 @@ +# Part III: Software Architecture + +Good architecture makes code easier to understand, test, and hand off. The team that inherits the system inherits every architectural choice as their long-term maintenance burden. + +## 6.1 Design Principles + +### Prefer Simple Over Clever + +```python +# Clever - hard to understand at a glance +result = {k: v for d in [a, b, c] for k, v in d.items() if v and k not in exclude} + +# Simple - obviously correct +result = {} +for d in [a, b, c]: + for k, v in d.items(): + if v and k not in exclude: + result[k] = v +``` + +Write the boring version first. Optimize later if needed. + +### Composition Over Inheritance + +```python +# Inheritance - rigid +class BaseProcessor: + def process(self): ... +class CustomerProcessor(BaseProcessor): ... +class PremiumCustomerProcessor(CustomerProcessor): ... # Getting deep... + +# Composition - flexible +def process_customer(customer, validators, transformers): + for validator in validators: + validator(customer) + for transformer in transformers: + customer = transformer(customer) + return customer +``` + +### Fail Fast, Fail Loud + +```python +# BAD - silent failure +def get_config(path): + try: + return load_yaml(path) + except FileNotFoundError: + return {} # will cause confusing errors later + +# GOOD +def get_config(path): + if not path.exists(): + raise ConfigError(f"Config file not found: {path}") + return load_yaml(path) +``` + +### Design for Change + +Group things that change together by *domain*, not *technical layer*: + +``` +# BAD - customer logic scattered by layer +project/ +โ”œโ”€โ”€ validators/customer_validator.py +โ”œโ”€โ”€ loaders/customer_loader.py +โ””โ”€โ”€ transformers/customer_transformer.py + +# GOOD - related logic grouped by domain +project/ +โ””โ”€โ”€ customers/ + โ””โ”€โ”€ processor.py # validate, load, transform together +``` + +Isolate external dependencies behind interfaces; use configuration for values that might change. + +### Single Source of Truth + +Every piece of configuration should have one authoritative location. Duplication leads to inconsistency. + +**Where configuration lives:** + +| Value Type | Where to Define | Example | +|------------|----------------|---------| +| Environment-specific (catalog, schema) | DAB variables in `databricks.yml` | `catalog: ${var.catalog_name}` | +| Secrets (API keys, tokens) | Databricks Secrets | `dbutils.secrets.get("scope", "key")` | +| Application constants | Python module (`constants.py`) | `MAX_RETRIES = 3` | +| Complex/structured config | YAML files loaded via Pydantic | `configs/prod.yaml` | +| Runtime overrides | Notebook widgets with defaults from above | `dbutils.widgets.text(...)` | + +**Recommended pattern:** DAB variables โ†’ injected as job parameters โ†’ widgets provide defaults for interactive use. + +```yaml +# databricks.yml - source of truth +variables: + catalog_name: + default: dev_catalog +targets: + prod: + variables: + catalog_name: prod_catalog +``` + +```python +# In notebook - widget with fallback +dbutils.widgets.text("catalog", "dev_catalog") +catalog = dbutils.widgets.get("catalog") # job param overrides when run as job +``` + +**What NOT to do:** Hardcode environment-specific values, duplicate the same value in DAB variables AND config files, put secrets in config files, define defaults in multiple places. + +### Build for Handoff + +Prefer explicit over implicit. Code should be obvious to someone reading it for the first time. + +```python +# IMPLICIT +def process(df): + df = _clean(df) # what does _clean do? + df = _transform(df) # transforms how? + return df + +# EXPLICIT +def process_customer_orders(df: DataFrame) -> DataFrame: + df = remove_duplicate_orders(df) + df = convert_timestamps_to_utc(df) + return df +``` + +## 6.2 Project Structure + + +``` +project-root/ +โ”‚ +โ”œโ”€โ”€ pyproject.toml # Project metadata & dependencies +โ”œโ”€โ”€ uv.lock # Locked dependencies +โ”œโ”€โ”€ requirements.txt # For Databricks notebook imports +โ”œโ”€โ”€ Makefile # Task runner (install, test, lint, etc.) +โ”œโ”€โ”€ .pre-commit-config.yaml # Pre-commit hooks +โ”œโ”€โ”€ .gitignore +โ”œโ”€โ”€ README.md +โ”‚ +โ”œโ”€โ”€ {project_name}/ # Python package (core logic) +โ”‚ โ”œโ”€โ”€ __init__.py +โ”‚ โ”œโ”€โ”€ config/ +โ”‚ โ”œโ”€โ”€ pipelines/ +โ”‚ โ”œโ”€โ”€ models/ # ML models (if applicable) +โ”‚ โ”œโ”€โ”€ agents/ # Agents (if applicable) +โ”‚ โ”œโ”€โ”€ tools/ # Agent tools (if applicable) +โ”‚ โ””โ”€โ”€ utils/ +โ”‚ +โ”œโ”€โ”€ tests/ # Tests mirror source structure +โ”‚ โ”œโ”€โ”€ conftest.py +โ”‚ โ”œโ”€โ”€ unit/ +โ”‚ โ””โ”€โ”€ integration/ +โ”‚ +โ”œโ”€โ”€ configs/ # Environment config files +โ”‚ โ”œโ”€โ”€ dev.yaml +โ”‚ โ”œโ”€โ”€ staging.yaml +โ”‚ โ””โ”€โ”€ prod.yaml +โ”‚ +โ”œโ”€โ”€ scripts/ # Utility scripts +โ”‚ โ””โ”€โ”€ generate_requirements.py +โ”‚ +โ”œโ”€โ”€ notebooks/ # Databricks notebooks (entrypoints only) +โ”‚ โ”œโ”€โ”€ 00_setup/ +โ”‚ โ”œโ”€โ”€ 01_data/ +โ”‚ โ”œโ”€โ”€ 02_development/ +โ”‚ โ””โ”€โ”€ 03_deployment/ +โ”‚ +โ”œโ”€โ”€ databricks.yml # DAB configuration +โ”œโ”€โ”€ resources/ # DAB resource definitions +โ”‚ โ”œโ”€โ”€ jobs/ +โ”‚ โ”œโ”€โ”€ pipelines/ +โ”‚ โ””โ”€โ”€ apps/ +โ”‚ +โ””โ”€โ”€ .github/workflows/ # CI/CD pipelines +``` + +### Key Principle: Notebooks Are Entrypoints, Not Implementation + +```python +# GOOD - notebook imports and calls +from project.pipelines import customer_pipeline +result = customer_pipeline.run(spark, config) + +# BAD - notebook contains the implementation +def complex_transformation(df): # 200 lines of logic in a notebook cell + ... +``` + +| Concern | Notebooks | Python Modules | +|---------|-----------|---------------| +| **Testing** | Hard to unit test | Easy with pytest | +| **Code review** | Noisy diffs (cell metadata) | Clean, readable diffs | +| **Reuse** | Copy-paste | Import from anywhere | +| **IDE support** | Limited | Full support | + +### Committing Notebooks to Git + +Commit notebooks as Python source files (`.py`) rather than `.ipynb`. Databricks supports `.py` notebooks natively with `# COMMAND ----------` cell separators. + +``` +# GOOD - clean diffs +notebooks/run_pipeline.py + +# AVOID - outputs bloat repo +notebooks/run_pipeline.ipynb +``` + +If you must use `.ipynb`, clear outputs before committing. + +## 6.3 Code Organization + +### Separation of Concerns + +Organize by domain/feature for larger projects; by technical concern for smaller ones: + +```python +# Larger projects - by domain +project/ +โ”œโ”€โ”€ customers/ # loader.py, transformer.py, validator.py +โ”œโ”€โ”€ billing/ # api.py, calculator.py +โ”œโ”€โ”€ agents/ # support_agent.py, tools/ +โ””โ”€โ”€ common/ # config.py, logging.py + +# Smaller projects - by technical concern +project/ +โ”œโ”€โ”€ data/ # Data access and transformation +โ”œโ”€โ”€ models/ # ML model logic +โ”œโ”€โ”€ api/ # External API integrations +โ””โ”€โ”€ utils/ # Shared utilities +``` + +### Functions vs Classes + +Use **functions** for stateless transformations (data in โ†’ data out). +Use **classes** when you need state, shared context, or dependency injection for testing. + +```python +# Good function - stateless +def clean_customer_data(df: DataFrame) -> DataFrame: + return df.dropDuplicates(["customer_id"]).filter(col("status") == "active") + +# Good class - stateful, injectable +class CustomerDataLoader: + def __init__(self, spark: SparkSession, catalog: str): + self.spark = spark + self.catalog = catalog + + def load_customers(self, date: str) -> DataFrame: + return self.spark.table(f"{self.catalog}.customers").filter(col("date") == date) +``` + +### Constants + +```python +# constants.py +MAX_RETRIES = 3 +BATCH_SIZE = 1000 +STATUS_ACTIVE = "active" +STATUS_INACTIVE = "inactive" +VALID_STATUSES = {STATUS_ACTIVE, STATUS_INACTIVE} +``` + +Use for: magic numbers, repeated string literals, values that might change but shouldn't be scattered. +Don't use for: environment-specific values (use config), values used exactly once. + +### Dependency Injection + +Pass dependencies in rather than creating them internally โ€” makes code testable and flexible. + +```python +# GOOD - dependencies injected +def process_batch(spark: SparkSession, config: Config, logger: Logger): + data = spark.table(config.source_table) + ... + +# BAD - hidden dependencies +def process_batch(): + spark = SparkSession.builder.getOrCreate() # Hidden, hard to test + config = load_config() # Where does this come from? +``` + +### Rule of Three + +Don't abstract too early. Wait until you have three similar implementations before creating a shared abstraction. + +## 6.4 Config Management + +Externalize configuration to make code environment-agnostic. Never hardcode catalog names, endpoints, or environment-specific values. + +**Loading configuration with Pydantic:** + +```python +from pydantic_settings import BaseSettings +from pydantic import Field + +class AppConfig(BaseSettings): + catalog: str + schema_name: str = Field(alias="schema") + batch_size: int = 1000 + endpoint: str + + class Config: + env_prefix = "APP_" # Reads APP_CATALOG, APP_SCHEMA, etc. + +config = AppConfig() +# Or load from file +config = AppConfig(_env_file="configs/dev.yaml") +``` + +**Key principles:** +- Never hardcode environment-specific values +- Use Databricks Secrets for credentials (not config files) +- Validate configuration at startup โ€” fail fast if something's wrong +- Make the default environment safe (dev, not prod) + +## 6.5 Resilience & Idempotency + +### Retry with Exponential Backoff + +```python +import backoff + +@backoff.on_exception( + backoff.expo, + (RequestException, TimeoutError), + max_tries=3, + max_time=60 +) +def call_external_api(endpoint: str, data: dict) -> dict: + response = requests.post(endpoint, json=data, timeout=10) + response.raise_for_status() + return response.json() +``` + +| Retry | Don't Retry | +|-------|------------| +| Network timeouts | Invalid input (400) | +| Rate limits (429) | Auth failures (401, 403) | +| Server errors (5xx) | Not found (404) | + +### Idempotency + +Data pipelines should produce the same result when run multiple times. + +```python +# BAD - append creates duplicates on re-run +df.write.mode("append").saveAsTable("catalog.schema.customers") + +# GOOD - merge ensures idempotency +from delta.tables import DeltaTable +target = DeltaTable.forName(spark, "catalog.schema.customers") +target.alias("t").merge( + source_df.alias("s"), + "t.customer_id = s.customer_id" +).whenMatchedUpdateAll() \ + .whenNotMatchedInsertAll() \ + .execute() +``` + +### Error Boundaries + +Catch errors at system boundaries โ€” not deep inside business logic. + +```python +# GOOD - let errors propagate, catch at the entry boundary +def handle_discount_request(customer_id): + try: + customer = get_customer(customer_id) + loyalty_score = get_loyalty_score(customer) + return calculate_discount(customer, loyalty_score) + except LoyaltyServiceError: + logger.warning(f"Loyalty service unavailable for {customer_id}") + return calculate_discount(customer, loyalty_score=1.0) +``` + +## 6.6 API & Interface Design + +```python +# Self-documenting interface +def search_documents( + query: str, + filters: dict[str, Any] | None = None, + limit: int = 10, + include_metadata: bool = False +) -> list[Document]: + """Search documents matching the query. + + Args: + query: Natural language search query + filters: Optional field filters (e.g., {"category": "billing"}) + limit: Maximum results to return (default 10, max 100) + + Returns: + List of matching documents, ordered by relevance + + Raises: + ValueError: If query is empty or limit > 100 + """ +``` + +**Principles:** Required params first, sensible defaults, consistent return types, fail explicitly (raise ValueError, not return None). + +### Data Classes for Complex Returns + +```python +from dataclasses import dataclass + +# BAD - what does result[0] mean? +def process_batch(df) -> tuple[DataFrame, int, list[str]]: ... + +# GOOD +@dataclass +class BatchResult: + processed_df: DataFrame + error_count: int + error_ids: list[str] + +def process_batch(df) -> BatchResult: ... +result = process_batch(df) +print(f"Errors: {result.error_count}") +``` + +## 6.7 Data Contracts & Schemas + +### Pydantic for Validation + +```python +from pydantic import BaseModel, Field, field_validator + +class CustomerRecord(BaseModel): + customer_id: str + email: str + status: str = Field(pattern="^(active|inactive|churned)$") + balance: float = Field(ge=0) + + @field_validator("email") + @classmethod + def validate_email(cls, v: str) -> str: + if "@" not in v: + raise ValueError("Invalid email format") + return v.lower() +``` + +### Delta Table Schemas + +```python +from pyspark.sql.types import StructType, StructField, StringType, DoubleType + +CUSTOMER_SCHEMA = StructType([ + StructField("customer_id", StringType(), nullable=False), + StructField("email", StringType(), nullable=False), + StructField("status", StringType(), nullable=False), + StructField("balance", DoubleType(), nullable=False), +]) + +df = spark.read.schema(CUSTOMER_SCHEMA).json(source_path) +df.write.option("mergeSchema", "false").saveAsTable("catalog.schema.customers") +``` + +## 6.8 Versioning & Rollback + +Use semantic versioning (MAJOR.MINOR.PATCH): + +```bash +git tag -a v1.2.0 -m "v1.2.0: Add billing features" +git push origin v1.2.0 +``` + +### Rolling Back Code and Infrastructure + +```bash +# 1. Check out previous release tag +git checkout v1.1.0 + +# 2. Redeploy to prod +databricks bundle deploy -t prod +``` + +**Recommended approach:** Configure CI/CD to deploy to prod on release tag. To roll back, create a new release tag pointing to the previous known-good commit. + +### Rolling Back Models and Data + +| Asset | How to Rollback | +|-------|----------------| +| **Registered models** | `client.set_registered_model_alias(name, "production", previous_version)` | +| **Delta tables** | `spark.read.option("versionAsOf", 5).table(...)` or `RESTORE TABLE t TO VERSION AS OF 5` | + +## 6.9 Documenting Architecture Decisions + +Document significant decisions so future maintainers understand why things are the way they are. + +**When to document:** Choosing between valid approaches, non-obvious trade-offs, decisions that would be expensive to reverse. + +| Scope | Template | When to Use | +|-------|----------|------------| +| Overall system | Design Doc | New systems, major features, significant architecture changes | +| Focused decision | Decision Doc | Choosing between options for a specific component | + +Keep documentation in your project's shared document store (e.g. Confluence, Google Drive, Notion). Link from the Reference Doc. + +## 6.10 System Design + +### When to Design Upfront + +| Task Type | Design Approach | +|-----------|----------------| +| Bug fix, small feature | Jump in, no formal design | +| New component or service | Sketch on a diagramming tool (e.g. LucidChart, Miro, draw.io), then build | +| New system, major feature | Write a Design Doc, get sign-off before building | +| Cross-team or external-facing | Design Doc required | + +### Start with Requirements + +- **Functional:** What should the system do? +- **Non-functional:** What qualities must it have? (latency, uptime, concurrency) +- **Constraints:** What are you working within? (timeline, existing infrastructure) + +Distinguish **MUST** (blocking), **SHOULD** (expected), **COULD** (stretch). + +### Consider Failure Modes + +| Component | Failure Mode | Mitigation | +|-----------|-------------|-----------| +| External API | Timeout, rate limit | Retry with backoff, circuit breaker | +| Vector search | Index unavailable | Graceful degradation, cached results | +| LLM endpoint | High latency, errors | Timeout, fallback model | +| Data pipeline | Bad input data | Validation, quarantine bad records, alerts | + +Design for predicted failures. Prioritize operational simplicity โ€” complex failover mechanisms are hard to hand off. + +### Make Trade-offs Explicit + +| Trade-off | We Chose | Because | Implication | +|-----------|---------|---------|------------| +| Latency vs Cost | Higher latency | Serverless reduces ops burden | P95 ~800ms, not ~200ms | +| Flexibility vs Simplicity | Simpler design | Timeline constraint, prioritize maintainability | Adding new data sources requires significant refactor | diff --git a/databricks-skills/dev-best-practices/4-databricks-platform.md b/databricks-skills/dev-best-practices/4-databricks-platform.md new file mode 100644 index 00000000..773635c0 --- /dev/null +++ b/databricks-skills/dev-best-practices/4-databricks-platform.md @@ -0,0 +1,274 @@ +# Part IV: Databricks Platform + +Databricks-specific best practices for building handoff-ready solutions. For comprehensive platform guidance, see [Databricks docs](https://docs.databricks.com/). + +## 7.1 Notebooks + +For detailed guidance on notebooks as entrypoints, project structure, and committing notebooks to git, see [**3-architecture.md ยง6.2 (Project Structure)**](3-architecture.md). + +**Notebook Structure โ€” keep them simple:** + +```python +# Databricks notebook source +# MAGIC %md +# MAGIC # Pipeline Name +# MAGIC Brief description of what this notebook does. + +# COMMAND ---------- +# MAGIC %pip install -r ../requirements.txt +# COMMAND ---------- +dbutils.library.restartPython() + +# COMMAND ---------- +# Config via widgets (job params override defaults) +dbutils.widgets.text("catalog", "dev_catalog") +dbutils.widgets.text("env", "dev") + +catalog = dbutils.widgets.get("catalog") +env = dbutils.widgets.get("env") + +# COMMAND ---------- +from project.pipelines import customer_pipeline +from project.config import load_config + +config = load_config(env) +result = customer_pipeline.run(spark, catalog, config) +``` + +## 7.2 Databricks Asset Bundles (DABs) + +DABs are the **required** approach for managing Databricks infrastructure. They enable infrastructure as code, version-controlled deployments, and consistent dev/prod environments. + +**Why DABs:** +- **Multi-environment:** Same code deploys to dev, staging, prod with variable substitution +- **Dev isolation:** `mode: development` prefixes resources with username, preventing conflicts +- **CI/CD integration:** Tag-based deployments to prod + +**`databricks.yml`:** + +```yaml +bundle: + name: project-name + +variables: + catalog_name: + description: "Unity Catalog name" + default: "dev_catalog" + +targets: + dev: + mode: development # Prefixes resources with [dev username] + default: true + workspace: + host: ${workspace.dev_host} + variables: + catalog_name: dev_catalog + + prod: + mode: production + workspace: + host: ${workspace.prod_host} + variables: + catalog_name: prod_catalog + run_as: + service_principal_name: "project-sp" + +include: + - resources/**/*.yml +``` + +**Common Commands:** + +```bash +databricks bundle validate -t dev # Validate config +databricks bundle deploy -t dev # Deploy resources +databricks bundle run job_name -t dev # Run a job +databricks bundle destroy -t dev # Clean up dev resources +``` + +For DABs YAML reference (jobs, pipelines, dashboards, volumes), see **[databricks-asset-bundles](../databricks-asset-bundles/SKILL.md)**. + +## 7.3 Unity Catalog + +**Naming Conventions:** + +| Resource | Convention | Example | +|----------|-----------|---------| +| Catalog | `{env}_{domain}` | `dev_sales`, `prod_sales` | +| Schema | `{purpose}` | `bronze`, `silver`, `gold` | +| Table | `{descriptive_name}` | `customer_transactions` | +| Volume | `{purpose}` | `landing_zone`, `checkpoints` | + +**Always use three-level namespace:** + +```python +# GOOD - explicit +spark.table("prod_sales.gold.customer_transactions") + +# BETTER - config-driven for environment flexibility +spark.table(f"{config.catalog}.{config.schema}.customers") + +# BAD - relies on default catalog/schema +spark.table("customers") +``` + +For Unity Catalog operations and system tables queries, see **[databricks-unity-catalog](../databricks-unity-catalog/SKILL.md)**. + +## 7.4 Compute + +**Development:** +- Start with the smallest viable cluster (or serverless) +- Set auto-terminate to 30โ€“60 minutes +- Single-node clusters work for most development tasks +- Use your personal cluster, don't share during active development + +**Production:** +- Use job clusters (ephemeral) rather than all-purpose clusters +- Right-size based on actual workload โ€” profile before scaling up +- Enable autoscaling with sensible min/max bounds +- Prefer serverless for variable workloads to avoid idle costs + +**When to use serverless:** Variable duration/frequency jobs, development and testing, when minimizing ops overhead. + +**When to use dedicated clusters:** Predictable long-running workloads, specific library/config requirements, cost optimization for sustained high utilization. + +--- + +## 8. Data Engineering Patterns + +### 8.1 Idempotent Writes + +Pipelines should produce the same result when run multiple times โ€” critical for reliability and recovery. + +```python +# BAD - append creates duplicates on re-run +df.write.mode("append").saveAsTable("catalog.schema.customers") + +# GOOD - merge ensures idempotency +from delta.tables import DeltaTable +target = DeltaTable.forName(spark, "catalog.schema.customers") +target.alias("t").merge( + source_df.alias("s"), + "t.customer_id = s.customer_id" +).whenMatchedUpdateAll() \ + .whenNotMatchedInsertAll() \ + .execute() + +# For full table refreshes - overwrite is also idempotent +df.write.mode("overwrite").saveAsTable("catalog.schema.daily_summary") +``` + +### 8.2 Medallion Architecture + +``` +Bronze (Raw) โ†’ Silver (Cleaned) โ†’ Gold (Business-Ready) +``` + +| Layer | Purpose | Typical Operations | +|-------|---------|-------------------| +| **Bronze** | Raw ingestion, preserve source fidelity | Append-only, minimal transformation | +| **Silver** | Cleaned, deduplicated, validated | Schema enforcement, deduplication, joins | +| **Gold** | Business-ready aggregations | Aggregations, business logic, feature tables | + +This isn't the only valid pattern โ€” use what fits the use case. Good default for data pipelines. + +### 8.3 Data Quality + +**For Lakeflow Declarative Pipelines:** Use [expectations](https://docs.databricks.com/en/delta-live-tables/expectations.html) for declarative data quality checks. + +**For Spark jobs:** Validate at boundaries (ingestion, before writes): + +```python +def validate_customers(df: DataFrame) -> DataFrame: + """Validate and filter bad records.""" + valid_df = df.filter( + col("customer_id").isNotNull() & + col("email").isNotNull() + ) + + invalid_count = df.count() - valid_df.count() + if invalid_count > 0: + logger.warning(f"Filtered {invalid_count} invalid records") + + return valid_df.dropDuplicates(["customer_id"]) +``` + +**Key principle:** Validate early, fail fast on critical issues, quarantine (don't drop) bad records when possible. + +### 8.4 Incremental Processing + +For large tables, process only changed data using Change Data Feed: + +```python +# Enable CDF on the source table (one-time) +spark.sql(""" + ALTER TABLE catalog.schema.source_table + SET TBLPROPERTIES (delta.enableChangeDataFeed = true) +""") + +# Read only changes since last run +changes = spark.read.format("delta") \ + .option("readChangeFeed", "true") \ + .option("startingVersion", last_processed_version) \ + .table("catalog.schema.source_table") +``` + +--- + +## 9. Resource Management + +### 9.1 Resource Cleanup + +**Weekly cleanup:** +- Delete test tables and volumes in dev catalogs +- Remove unused MLflow experiments and model versions +- Audit dev resources and delete anything not actively in use +- Run `databricks bundle destroy -t dev` for abandoned branches + +**Before leaving for extended periods:** +- Terminate all interactive clusters +- Ensure no scheduled jobs are running unnecessarily in dev + +### 9.2 Project Handoff Checklist + +- [ ] All personal dev resources deleted (`databricks bundle destroy`) +- [ ] Production resources documented in Reference Doc +- [ ] Ownership transferred to the team taking over +- [ ] Developer access removed or downgraded +- [ ] Service principal credentials rotated (if temporary dev access was granted) +- [ ] No orphaned resources (clusters, jobs, endpoints) +- [ ] Receiving team can deploy and operate independently + +--- + +## 10. Security & Data Governance + +### 10.1 Credentials Management + +**Never hardcode credentials.** Use [Databricks Secrets](https://docs.databricks.com/aws/en/security/secrets/): + +```bash +# Store a secret (CLI) +databricks secrets put-secret --scope project --key api_key +``` + +```python +# Access in code - value is redacted in logs +api_key = dbutils.secrets.get(scope="project", key="api_key") +``` + +Common credential types to manage: external API keys, service principal secrets, database connection strings, OAuth tokens. + +### 10.2 Data Handling + +- **Work in Databricks** โ€” avoid downloading data to local machines +- **Follow customer data policies** โ€” they may have stricter requirements than Databricks defaults +- **Mask PII in dev/test** โ€” use tokenization, hashing, or synthetic data +- **Don't commit data** โ€” no CSVs, JSONs, or sample data in git repos + +### 10.3 Access Control + +- **Principle of least privilege** โ€” request only the access you need +- **Use service principals** for production jobs, not personal credentials +- **Document access grants** โ€” track what access was granted and why +- **Remove access when no longer needed** โ€” don't leave orphaned permissions diff --git a/databricks-skills/dev-best-practices/5-productionization.md b/databricks-skills/dev-best-practices/5-productionization.md new file mode 100644 index 00000000..0ce8f3b3 --- /dev/null +++ b/databricks-skills/dev-best-practices/5-productionization.md @@ -0,0 +1,304 @@ +# Part VI: Productionization + +Productionization is where development teams deliver lasting value. It's not enough to build something that works in a notebook โ€” we need to deliver systems that are tested, observable, deployable, and maintainable by the team that owns it long-term. + +## 11. Observability & Monitoring + +Production systems need to be observable โ€” you should be able to understand what's happening, detect problems, and debug issues without adding more code. + +For application-level logging setup, see [**2-code-quality.md ยง4.6 (Logging)**](2-code-quality.md). + +### 11.1 Logging Best Practices + +**Log at boundaries:** +- Entry and exit of jobs/pipelines +- External API calls (request/response summary, not full payload) +- Errors and exceptions with context +- Business events (records processed, decisions made) + +**Include context:** + +```python +logger.info( + "Pipeline completed", + extra={ + "pipeline": "customer_etl", + "records_processed": 15000, + "duration_seconds": 45, + "run_id": run_id, + } +) +``` + +**Don't log:** Sensitive data (PII, credentials), high-volume debug logs in production, full request/response payloads. + +### 11.2 Key Metrics + +Track metrics that reflect system health, performance, and business outcomes. Start with the metrics you will actually use. + +**Common SLAs:** +- Latency (P50/P95/P99) โ€” consider latency for each subsystem, not just round-trip +- Requests per second/minute/hour +- Error rate (status code counts) +- Token usage and cost (for AI systems) + +**Metric tips:** +- Break metrics down by request type, region, date/time for useful insights +- Add tags consistently so metrics can be filtered and grouped +- Define SLOs for critical systems so teams know what "healthy" means +- Leverage [DBSQL alerts](https://docs.databricks.com/aws/en/sql/user/alerts/) when thresholds are exceeded + +For alert YAML configuration, see **[databricks-asset-bundles alerts_guidance.md](../databricks-asset-bundles/alerts_guidance.md)**. + +### 11.3 Dashboards + +**Essential views:** +- **Health overview:** Error rate, latency, throughput +- **Business metrics:** Records processed, API calls, user activity + +**Dashboard tips:** +- Keep dashboards focused โ€” one purpose per dashboard +- Use consistent time ranges across panels +- Include links to runbooks for when things go wrong +- Review and prune unused dashboards periodically + +### 11.4 Alerting + +Alerts should be actionable. If an alert fires and there's nothing to do, it shouldn't be an alert. + +**Good alert criteria:** +- Error rate > 5% for 5+ minutes +- P95 latency > SLA threshold for 10+ minutes +- Job failure (any production job) +- Resource exhaustion warnings + +**Alert hygiene:** +- Every alert should have a runbook or clear next steps +- Review alert fatigue โ€” too many alerts = ignored alerts +- Tune thresholds based on actual incidents + +--- + +## 12. Deployment & CI/CD + +CI/CD automation ensures consistent, repeatable deployments and catches issues before they reach production. + +For the release flow overview, see [**1-foundations-and-setup.md ยง3.7 (Git Hygiene)**](1-foundations-and-setup.md). +For rollback procedures, see [**3-architecture.md ยง6.8 (Versioning & Rollback)**](3-architecture.md). + +### 12.1 Environment Strategy + +| Environment | Purpose | Deployed By | DAB Mode | +|-------------|---------|------------|---------| +| **Dev** | Individual developer iteration | Manual (`databricks bundle deploy -t dev`) | `development` (auto-prefixes resources) | +| **Staging** | Integration testing, mirrors prod | Automatic on merge to main | `production` | +| **Prod** | Customer-facing | Manual release tag | `production` | + +**Key insight:** Dev is for manual developer iteration, NOT for CI/CD. The CI/CD pipeline deploys to staging and prod only. + +### 12.2 CI Pipeline (Pull Requests) + +Run **fast, cheap checks** on every PR. Don't run integration tests or evals here โ€” they're expensive and slow. + +```yaml +# .github/workflows/ci.yml +name: CI +on: + pull_request: + branches: [main] + +jobs: + validate: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: install uv + uses: astral-sh/setup-uv@v4 + + - name: install dependencies + run: uv sync --all-groups + + - name: lint + run: uv run ruff check . + + - name: unit tests + run: uv run pytest tests/unit/ -v + + - name: validate bundle + env: + DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} + DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }} + run: databricks bundle validate -t staging +``` + +### 12.3 Staging Deployment (Merge to Main) + +On merge to main, deploy to staging and run integration tests: + +```yaml +# .github/workflows/deploy-staging.yml +name: Deploy to Staging +on: + push: + branches: [main] + +jobs: + deploy: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + + - name: install Databricks CLI + uses: databricks/setup-cli@main + + - name: deploy to staging + env: + DATABRICKS_HOST: ${{ secrets.STAGING_DATABRICKS_HOST }} + DATABRICKS_TOKEN: ${{ secrets.STAGING_DATABRICKS_TOKEN }} + run: databricks bundle deploy -t staging + + - name: run integration tests + env: + DATABRICKS_HOST: ${{ secrets.STAGING_DATABRICKS_HOST }} + DATABRICKS_TOKEN: ${{ secrets.STAGING_DATABRICKS_TOKEN }} + run: databricks bundle run integration_tests -t staging +``` + +If integration tests fail, fix the issue before creating a release tag. + +### 12.4 Production Deployment (Release Tags) + +Deploy to production when a release tag is created. Only after staging integration tests pass. + +```yaml +# .github/workflows/deploy-prod.yml +name: Deploy to Production +on: + release: + types: [published] + +jobs: + deploy: + runs-on: ubuntu-latest + environment: production # Requires approval if configured + steps: + - uses: actions/checkout@v4 + + - name: install Databricks CLI + uses: databricks/setup-cli@main + + - name: deploy to production + env: + DATABRICKS_HOST: ${{ secrets.PROD_DATABRICKS_HOST }} + DATABRICKS_TOKEN: ${{ secrets.PROD_DATABRICKS_TOKEN }} + run: databricks bundle deploy -t prod +``` + +### 12.5 Environment Promotion Flow + +**Key principles:** +- **PR checks are fast and cheap** โ€” unit tests, lint, DAB validate only +- **Integration tests run post-merge in staging** โ€” don't block PRs with expensive tests +- **Staging must pass before prod** โ€” don't tag a release until integration tests are green +- **Prod requires explicit release tag** โ€” intentional promotion, not automatic + +--- + +## 13. Documentation & Handoff + +Documentation is how knowledge survives a transition. The team that takes over will maintain this system after the original project team moves on. + +### 13.1 Documentation Artifacts + +| Document | Purpose | +|----------|--------| +| **Reference Doc** | Canonical system documentation โ€” architecture, components, operations | +| **Design Doc** | Captures design decisions and rationale for major features | +| **Decision Docs** | Records specific technical decisions | +| **README** | Quick start for developers | + +The **Reference Doc** is the most important handoff artifact โ€” it's where someone new to the project starts. + +### 13.2 README vs Reference Doc + +| Document | Purpose | Audience | Length | +|----------|---------|----------|--------| +| **README** | Get developers productive quickly | Developers who need to run/modify code | 1โ€“2 pages max | +| **Reference Doc** | Comprehensive system documentation | Anyone who needs to understand, operate, or extend the system | 10+ pages | + +**README should include:** Brief overview (2โ€“3 sentences), quick start steps, common commands, links to detailed documentation. + +**README should NOT include:** Architecture details, component deep-dives, API specifications, operational procedures, design decisions โ€” those belong in the Reference Doc / Design Docs. + +**README template:** + +```markdown +# Project Name + +Brief description (2โ€“3 sentences) of what this system does. + +## Quick Start + +1. Clone: `git clone ` +2. Install: `uv sync` +3. Deploy to dev: `databricks bundle deploy -t dev` +4. Run tests: `make test` + +## Common Commands + +| Command | Description | +|---------|-------------| +| `make test` | Run unit tests | +| `make lint` | Run linting | +| `make deploy-dev` | Deploy to dev workspace | + +## Documentation + +- **[Reference Doc](link)** - Architecture, components, operations +- **[Design Doc](link)** - Design decisions and rationale +- **[Dashboards](link)** - Monitoring and observability +``` + +### 13.3 Knowledge Transfer + +Before transitioning ownership of the system, conduct knowledge transfer sessions: + +| Session | Duration | Audience | Content | +|---------|---------|----------|--------| +| Architecture overview | 1โ€“2 hours | Technical team | System design, data flow, key decisions | +| Codebase walkthrough | 2โ€“3 hours | Developers | Code structure, patterns, how to make changes | +| DevOps walkthrough | 1โ€“2 hours | Ops/DevOps | Deployment, monitoring, alerting, runbooks | + +**Success criteria:** Customer team can independently deploy, monitor, and debug the system. + +### 13.4 Production Readiness Checklist + +Before considering a system production-ready: + +**Code & Infrastructure:** +- [ ] All code in version control +- [ ] CI/CD pipeline operational and tested +- [ ] DAB resources defined and deploying correctly +- [ ] Secrets managed via Databricks Secrets + +**Quality:** +- [ ] Unit tests passing with reasonable coverage +- [ ] Integration tests for critical paths +- [ ] Error handling comprehensive +- [ ] Logging implemented at key boundaries + +**Operations:** +- [ ] Monitoring dashboards created +- [ ] Alerts configured for critical failures +- [ ] Runbooks written for common issues +- [ ] Backup/recovery plan documented (if applicable) + +**Handoff:** +- [ ] Reference Doc complete and reviewed +- [ ] Knowledge transfer sessions completed +- [ ] Receiving team trained +- [ ] Receiving team can deploy independently +- [ ] Access controls configured (principle of least privilege) + +See also [**4-databricks-platform.md ยง9.2 (Project Handoff Checklist)**](4-databricks-platform.md) for resource cleanup tasks. diff --git a/databricks-skills/dev-best-practices/SKILL.md b/databricks-skills/dev-best-practices/SKILL.md new file mode 100644 index 00000000..aba35353 --- /dev/null +++ b/databricks-skills/dev-best-practices/SKILL.md @@ -0,0 +1,51 @@ +--- +name: dev-best-practices +description: "Databricks development best practices: Git workflow, code quality, architecture, platform usage, CI/CD, testing, and production handoff. Use when building production-ready Databricks solutions." +--- + +# Databricks Development Best Practices + +A practical guide for writing code, collaborating, and shipping production-ready solutions on Databricks โ€” from environment setup to production handoff. + +## Core Philosophy + +- **Platform-First:** Default to Databricks platform features before building custom solutions. Document all platform gaps for R&D feedback. +- **Build for Handoff:** Every line of code, architecture decision, and configuration should anticipate long-term ownership by the team that maintains it. +- **Leave It Better:** Actively improve what you touch โ€” this compounds value over time. +- **System Design = Implementation Quality:** Thoughtful architecture matters as much as clean, tested code. + +## Reference Files + +- **[1-foundations-and-setup.md](1-foundations-and-setup.md)** โ€” Dev environment (uv, Ruff, CLI, IDE), Git workflow, daily dev cycle +- **[2-code-quality.md](2-code-quality.md)** โ€” Python standards, documentation, naming, error handling, code simplicity, logging, testing +- **[3-architecture.md](3-architecture.md)** โ€” Design principles, project structure, code organization, config management, resilience, API design, versioning +- **[4-databricks-platform.md](4-databricks-platform.md)** โ€” Notebooks, DABs, Unity Catalog, compute, data engineering patterns, resource management, security +- **[5-productionization.md](5-productionization.md)** โ€” Observability, CI/CD, environment strategy, documentation, handoff checklist + +## Quick Reference + +| Topic | File | Section | +|-------|------|---------| +| uv / Ruff setup | [1-foundations-and-setup.md](1-foundations-and-setup.md) | ยง2 Dev Environment | +| Git branch naming | [1-foundations-and-setup.md](1-foundations-and-setup.md) | ยง3.2 Branch Strategy | +| PR workflow | [1-foundations-and-setup.md](1-foundations-and-setup.md) | ยง3.4โ€“3.6 Pull Requests | +| Python type hints / docstrings | [2-code-quality.md](2-code-quality.md) | ยง4.1โ€“4.2 | +| Naming conventions | [2-code-quality.md](2-code-quality.md) | ยง4.3 Naming | +| Error handling | [2-code-quality.md](2-code-quality.md) | ยง4.4 Error Handling | +| Code simplicity | [2-code-quality.md](2-code-quality.md) | ยง4.5 Code Simplicity | +| Logging standards | [2-code-quality.md](2-code-quality.md) | ยง4.6 Logging | +| Testing philosophy | [2-code-quality.md](2-code-quality.md) | ยง4.7 Testing | +| Project structure | [3-architecture.md](3-architecture.md) | ยง6.2 Project Structure | +| Config management pattern | [3-architecture.md](3-architecture.md) | ยง6.4 Config Management | +| Idempotent writes | [4-databricks-platform.md](4-databricks-platform.md) | ยง8.1 | +| Medallion architecture | [4-databricks-platform.md](4-databricks-platform.md) | ยง8.2 | +| CI/CD pipeline setup | [5-productionization.md](5-productionization.md) | ยง12 CI/CD | +| Production readiness | [5-productionization.md](5-productionization.md) | ยง13.4 Checklist | + +## Related Skills + +- **[databricks-asset-bundles](../databricks-asset-bundles/SKILL.md)** โ€” DABs YAML reference (jobs, pipelines, dashboards) +- **[databricks-python-sdk](../databricks-python-sdk/SKILL.md)** โ€” Python SDK implementation reference +- **[databricks-unity-catalog](../databricks-unity-catalog/SKILL.md)** โ€” Unity Catalog operations and system tables +- **[databricks-jobs](../databricks-jobs/SKILL.md)** โ€” Job orchestration patterns +- **[databricks-config](../databricks-config/SKILL.md)** โ€” profile and authentication setup for CLI/SDK