Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
443 changes: 430 additions & 13 deletions databricks-mcp-server/databricks_mcp_server/tools/compute.py

Large diffs are not rendered by default.

166 changes: 166 additions & 0 deletions databricks-skills/databricks-execution-compute/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
---
name: databricks-execution-compute
description: >-
Execute code on Databricks compute — serverless or classic clusters. Use this
skill when the user mentions: "run code", "execute", "run on databricks",
"serverless", "no cluster", "run python", "run scala", "run sql", "run R",
"run file", "push and run", "notebook run", "batch script", "model training",
"run script on cluster". Also use when the user wants to run local files on
Databricks or needs to choose between serverless and cluster compute.
---

# Databricks Execution Compute

Run code on Databricks — either on serverless compute (no cluster required) or on classic clusters (interactive, multi-language). Supports pushing local files to the Databricks workspace and executing them.

## Choosing the Right Tool

| Scenario | Tool | Why |
|----------|------|-----|
| **Run Python, no cluster available** | `run_code_on_serverless` | No cluster needed; serverless spins up automatically |
| **Run local file on a cluster** | `run_file_on_databricks` | Auto-detects language from extension; supports Python, Scala, SQL, R |
| **Interactive iteration (preserve variables)** | `execute_databricks_command` | Keeps execution context alive across calls |
| **SQL queries that need result rows** | `execute_sql` | Works with serverless SQL warehouses; returns data |
| **Batch/ETL Python, no interactivity needed** | `run_code_on_serverless` | Dedicated serverless resources, up to 30 min timeout |
| **Long-running production pipelines** | Databricks Jobs | Full scheduling, retries, monitoring |

## Ephemeral vs Persistent Mode

All execution tools support two modes:

**Ephemeral (default):** Code is executed and no artifact is saved in the workspace. Good for testing, exploration, quick checks.

**Persistent:** Pass `workspace_path` to save the code as a notebook in the Databricks workspace. The notebook stays after execution — visible in the UI, re-runnable, and versionable. Good for:
- Model training scripts
- ETL/data pipeline notebooks
- Any project work the user wants to keep

When the user is working on a project, ask where they want files saved and suggest a path like:
`/Workspace/Users/{username}/{project-name}/`

## MCP Tools

### run_code_on_serverless

Execute code on serverless compute via Jobs API. No cluster required.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `code` | string | *(required)* | Python or SQL code to execute |
| `language` | string | `"python"` | `"python"` or `"sql"` |
| `timeout` | int | `1800` | Max wait time in seconds (30 min) |
| `run_name` | string | auto-generated | Optional human-readable run name |
| `workspace_path` | string | None | Workspace path to persist the notebook. If omitted, uses temp path and cleans up |

**Returns:** `success`, `output`, `error`, `run_id`, `run_url`, `duration_seconds`, `state`, `message`, `workspace_path` (persistent mode).

**Output capture:** Use `dbutils.notebook.exit(value)` to return structured output. `print()` output may not be reliably captured. SQL SELECT results are NOT captured — use `execute_sql()` instead.

### run_file_on_databricks

Read a local file and execute it on a Databricks cluster. Auto-detects language from file extension.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `file_path` | string | *(required)* | Local path to the file (.py, .scala, .sql, .r) |
| `cluster_id` | string | auto-selected | Cluster to run on; auto-selects if omitted |
| `context_id` | string | None | Reuse an existing execution context |
| `language` | string | auto-detected | Override language detection |
| `timeout` | int | `600` | Max wait time in seconds |
| `destroy_context_on_completion` | bool | `false` | Destroy context after execution |
| `workspace_path` | string | None | Workspace path to also persist the file as a notebook |

**Returns:** `success`, `output`, `error`, `cluster_id`, `context_id`, `context_destroyed`, `message`.

### execute_databricks_command

Execute code interactively on a cluster. Best for iterative work — contexts persist variables and imports across calls.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `code` | string | *(required)* | Code to execute |
| `cluster_id` | string | auto-selected | Cluster to run on |
| `context_id` | string | None | Reuse existing context for speed + state |
| `language` | string | `"python"` | `"python"`, `"scala"`, `"sql"`, or `"r"` |
| `timeout` | int | `120` | Max wait time in seconds |
| `destroy_context_on_completion` | bool | `false` | Destroy context after execution |

**Returns:** `success`, `output`, `error`, `cluster_id`, `context_id`, `context_destroyed`, `message`.

## Cluster Management Helpers

| Tool | Description |
|------|-------------|
| `list_clusters` | List all user-created clusters in the workspace |
| `get_best_cluster` | Auto-select the best running cluster (prefers "shared" > "demo") |
| `start_cluster` | Start a terminated cluster (**always ask user first**) |
| `get_cluster_status` | Poll cluster state after starting |

### When No Cluster Is Available

If `execute_databricks_command` or `run_file_on_databricks` finds no running cluster:
1. The error response includes `startable_clusters` and `suggestions`
2. Ask the user if they want to start a terminated cluster (3-8 min startup)
3. Or suggest `run_code_on_serverless` for Python (no cluster needed)
4. Or suggest `execute_sql` for SQL workloads (uses SQL warehouses)

## Limitations

| Limitation | Applies To | Details |
|-----------|------------|---------|
| Cold start ~25-50s | Serverless | Serverless compute spin-up time |
| No interactive state | Serverless | Each invocation is fresh; no variables persist |
| Python and SQL only | Serverless | No R, Scala, or Java on serverless |
| SQL SELECT not captured | Serverless | Use `execute_sql()` for SELECT queries |
| Cluster must be running | Classic | Use `start_cluster` or switch to serverless |
| print() output unreliable | Serverless | Use `dbutils.notebook.exit()` instead |

## Quick Start Examples

### Run Python on serverless (ephemeral)

```python
run_code_on_serverless(
code="dbutils.notebook.exit('hello from serverless')"
)
```

### Run Python on serverless (persistent — save to project)

```python
run_code_on_serverless(
code=training_code,
workspace_path="/Workspace/Users/user@company.com/ml-project/train",
run_name="model-training-v1"
)
```

### Run a local file on a cluster

```python
run_file_on_databricks(file_path="/local/path/to/etl.py")
```

### Run a local file and persist it to workspace

```python
run_file_on_databricks(
file_path="/local/path/to/train.py",
workspace_path="/Workspace/Users/user@company.com/ml-project/train"
)
```

### Interactive iteration on a cluster

```python
# First call — creates context
result = execute_databricks_command(code="import pandas as pd\ndf = pd.DataFrame({'a': [1,2,3]})")
# Follow-up — reuses context (faster, state preserved)
execute_databricks_command(code="print(df.shape)", context_id=result["context_id"], cluster_id=result["cluster_id"])
```

## Related Skills

- **[databricks-jobs](../databricks-jobs/SKILL.md)** — Production job orchestration with scheduling, retries, and multi-task DAGs
- **[databricks-dbsql](../databricks-dbsql/SKILL.md)** — SQL warehouse capabilities and AI functions
- **[databricks-python-sdk](../databricks-python-sdk/SKILL.md)** — Direct SDK usage for workspace automation
194 changes: 194 additions & 0 deletions databricks-skills/databricks-manage-compute/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,194 @@
---
name: databricks-manage-compute
description: >-
Create, modify, and delete Databricks compute resources (clusters and SQL
warehouses). Use this skill when the user mentions: "create cluster", "new
cluster", "resize cluster", "modify cluster", "delete cluster", "terminate
cluster", "create warehouse", "new warehouse", "resize warehouse", "delete
warehouse", "node types", "runtime versions", "DBR versions", "spin up
compute", "provision cluster".
---

# Databricks Manage Compute

Create, modify, and delete Databricks compute resources — classic clusters and SQL warehouses. Provides opinionated defaults so simple operations just work, with full override for power users.

## Decision Matrix

| User Intent | Tool | Notes |
|-------------|------|-------|
| **Create a new cluster** | `create_cluster` | Just needs name + num_workers; defaults handle the rest |
| **Resize or reconfigure a cluster** | `modify_cluster` | Change workers, DBR, node type, spark conf |
| **Stop a cluster (save costs)** | `terminate_cluster` | Reversible — can restart with `start_cluster` |
| **Permanently remove a cluster** | `delete_cluster` | DESTRUCTIVE — always confirm with user first |
| **Choose a node type** | `list_node_types` | Browse available VM types before creating |
| **Choose a DBR version** | `list_spark_versions` | Browse runtimes; filter for "LTS" |
| **Create a SQL warehouse** | `create_sql_warehouse` | Serverless Pro by default |
| **Resize a SQL warehouse** | `modify_sql_warehouse` | Change size, scaling, auto-stop |
| **Permanently remove a warehouse** | `delete_sql_warehouse` | DESTRUCTIVE — always confirm with user first |

## MCP Tools — Clusters

### create_cluster

Create a new Databricks cluster with sensible defaults.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `name` | string | *(required)* | Human-readable cluster name |
| `num_workers` | int | `1` | Fixed worker count (ignored if autoscale is set) |
| `spark_version` | string | latest LTS | DBR version key (e.g. "15.4.x-scala2.12") |
| `node_type_id` | string | auto-picked | Worker node type (e.g. "i3.xlarge") |
| `autotermination_minutes` | int | `120` | Minutes of inactivity before auto-stop |
| `data_security_mode` | string | `"SINGLE_USER"` | Security mode |
| `spark_conf` | string (JSON) | None | Spark config overrides as JSON |
| `autoscale_min_workers` | int | None | Min workers for autoscaling |
| `autoscale_max_workers` | int | None | Max workers for autoscaling |

**Returns:** `cluster_id`, `cluster_name`, `state`, `spark_version`, `node_type_id`, `message`.

### modify_cluster

Modify an existing cluster. Only specified parameters change; the rest stay as-is. Running clusters will restart.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `cluster_id` | string | *(required)* | Cluster to modify |
| `name` | string | unchanged | New cluster name |
| `num_workers` | int | unchanged | New worker count |
| `spark_version` | string | unchanged | New DBR version |
| `node_type_id` | string | unchanged | New node type |
| `autotermination_minutes` | int | unchanged | New auto-termination |
| `spark_conf` | string (JSON) | unchanged | New Spark config |
| `autoscale_min_workers` | int | unchanged | Enable/modify autoscaling |
| `autoscale_max_workers` | int | unchanged | Enable/modify autoscaling |

**Returns:** `cluster_id`, `cluster_name`, `state`, `message`.

### terminate_cluster

Stop a running cluster (reversible). The cluster can be restarted later with `start_cluster`.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `cluster_id` | string | *(required)* | Cluster to terminate |

**Returns:** `cluster_id`, `cluster_name`, `state`, `message`.

### delete_cluster

**DESTRUCTIVE** — Permanently delete a cluster. Always confirm with the user before calling.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `cluster_id` | string | *(required)* | Cluster to permanently delete |

**Returns:** `cluster_id`, `cluster_name`, `state`, `message` (includes warning).

### list_node_types

List available VM/node types for the workspace. Use this to help users choose a `node_type_id` for `create_cluster`.

**Returns:** List of `node_type_id`, `memory_mb`, `num_cores`, `num_gpus`, `description`, `is_deprecated`.

### list_spark_versions

List available Databricks Runtime versions. Filter for "LTS" in the name for long-term support versions.

**Returns:** List of `key`, `name`.

## MCP Tools — SQL Warehouses

### create_sql_warehouse

Create a new SQL warehouse. Defaults to serverless Pro with 120-minute auto-stop.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `name` | string | *(required)* | Warehouse name |
| `size` | string | `"Small"` | T-shirt size (2X-Small through 4X-Large) |
| `min_num_clusters` | int | `1` | Minimum clusters |
| `max_num_clusters` | int | `1` | Maximum clusters for scaling |
| `auto_stop_mins` | int | `120` | Auto-stop after inactivity |
| `warehouse_type` | string | `"PRO"` | PRO or CLASSIC |
| `enable_serverless` | bool | `true` | Enable serverless compute |

**Returns:** `warehouse_id`, `name`, `size`, `state`, `message`.

### modify_sql_warehouse

Modify an existing SQL warehouse. Only specified parameters change.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `warehouse_id` | string | *(required)* | Warehouse to modify |
| `name` | string | unchanged | New warehouse name |
| `size` | string | unchanged | New T-shirt size |
| `min_num_clusters` | int | unchanged | New min clusters |
| `max_num_clusters` | int | unchanged | New max clusters |
| `auto_stop_mins` | int | unchanged | New auto-stop timeout |

**Returns:** `warehouse_id`, `name`, `state`, `message`.

### delete_sql_warehouse

**DESTRUCTIVE** — Permanently delete a SQL warehouse. Always confirm with the user before calling.

| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `warehouse_id` | string | *(required)* | Warehouse to permanently delete |

**Returns:** `warehouse_id`, `name`, `state`, `message` (includes warning).

## Destructive Actions

`delete_cluster` and `delete_sql_warehouse` are permanent and irreversible. Before calling either:

1. Tell the user the action is permanent
2. Ask for explicit confirmation
3. Only proceed if the user confirms

`terminate_cluster` is safe and reversible — the cluster can be restarted.

## Quick Start Examples

### Create a simple cluster (all defaults)

```python
create_cluster(name="my-dev-cluster", num_workers=1)
```

### Create an autoscaling cluster

```python
create_cluster(
name="my-scaling-cluster",
autoscale_min_workers=1,
autoscale_max_workers=8,
autotermination_minutes=60
)
```

### Resize a cluster

```python
modify_cluster(cluster_id="1234-567890-abcdef", num_workers=4)
```

### Create a SQL warehouse

```python
create_sql_warehouse(name="analytics-warehouse", size="Medium")
```

### Stop a cluster to save costs

```python
terminate_cluster(cluster_id="1234-567890-abcdef")
```

## Related Skills

- **[databricks-execution-compute](../databricks-execution-compute/SKILL.md)** — Execute code on clusters and serverless compute
- **[databricks-dbsql](../databricks-dbsql/SKILL.md)** — SQL warehouse query capabilities
- **[databricks-python-sdk](../databricks-python-sdk/SKILL.md)** — Direct SDK usage for workspace automation
4 changes: 3 additions & 1 deletion databricks-skills/install_skills.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ MLFLOW_REPO_RAW_URL="https://raw.githubusercontent.com/mlflow/skills"
MLFLOW_REPO_REF="main"

# Databricks skills (hosted in this repo)
DATABRICKS_SKILLS="databricks-agent-bricks databricks-aibi-dashboards databricks-asset-bundles databricks-app-python databricks-config databricks-dbsql databricks-docs databricks-genie databricks-iceberg databricks-jobs databricks-lakebase-autoscale databricks-lakebase-provisioned databricks-metric-views databricks-mlflow-evaluation databricks-model-serving databricks-parsing databricks-python-sdk databricks-spark-declarative-pipelines databricks-spark-structured-streaming databricks-synthetic-data-gen databricks-unity-catalog databricks-unstructured-pdf-generation databricks-vector-search databricks-zerobus-ingest spark-python-data-source"
DATABRICKS_SKILLS="databricks-agent-bricks databricks-aibi-dashboards databricks-asset-bundles databricks-app-python databricks-config databricks-dbsql databricks-docs databricks-genie databricks-iceberg databricks-jobs databricks-lakebase-autoscale databricks-lakebase-provisioned databricks-manage-compute databricks-metric-views databricks-mlflow-evaluation databricks-model-serving databricks-parsing databricks-python-sdk databricks-execution-compute databricks-spark-declarative-pipelines databricks-spark-structured-streaming databricks-synthetic-data-gen databricks-unity-catalog databricks-unstructured-pdf-generation databricks-vector-search databricks-zerobus-ingest spark-python-data-source"

# MLflow skills (fetched from mlflow/skills repo)
MLFLOW_SKILLS="agent-evaluation analyze-mlflow-chat-session analyze-mlflow-trace instrumenting-with-mlflow-tracing mlflow-onboarding querying-mlflow-metrics retrieving-mlflow-traces searching-mlflow-docs"
Expand Down Expand Up @@ -73,6 +73,8 @@ get_skill_description() {
"databricks-iceberg") echo "Apache Iceberg - managed tables, UniForm, IRC, Snowflake interop, migration" ;;
"databricks-jobs") echo "Databricks Lakeflow Jobs - workflow orchestration" ;;
"databricks-python-sdk") echo "Databricks Python SDK, Connect, and REST API" ;;
"databricks-execution-compute") echo "Execute code on Databricks - serverless and classic cluster compute" ;;
"databricks-manage-compute") echo "Create, modify, and delete Databricks clusters and SQL warehouses" ;;
"databricks-unity-catalog") echo "System tables for lineage, audit, billing" ;;
"databricks-lakebase-autoscale") echo "Lakebase Autoscale - managed PostgreSQL with autoscaling" ;;
"databricks-lakebase-provisioned") echo "Lakebase Provisioned - data connections and reverse ETL" ;;
Expand Down
Loading