diff --git a/databricks-skills/databricks-genie/2-conversation-api.md b/databricks-skills/databricks-genie/2-conversation-api.md new file mode 100644 index 00000000..5edbadaf --- /dev/null +++ b/databricks-skills/databricks-genie/2-conversation-api.md @@ -0,0 +1,286 @@ +# Genie Conversation API + +Use the Genie Conversation API to ask natural language questions to a curated Genie Space and receive SQL-generated answers. + +## Overview + +The Conversation API provides two MCP tools for interacting with Genie Spaces: + +| Tool | Purpose | Key Parameters | +|------|---------|----------------| +| `ask_genie` | Start a new conversation with a question | `space_id`, `question` | +| `ask_genie_followup` | Ask a follow-up in an existing conversation | `space_id`, `conversation_id`, `question` | + +Both tools send a natural language question to a Genie Space, which generates SQL, executes it against the configured SQL warehouse, and returns structured results. The two-tool design enforces a clear separation: `ask_genie` always starts fresh, while `ask_genie_followup` requires an explicit `conversation_id` to maintain context. + +## Tool Reference + +### `ask_genie` + +Starts a new conversation and asks a question. + +**Parameters:** + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `space_id` | string | Yes | — | The Genie Space ID to query | +| `question` | string | Yes | — | Natural language question (must be non-empty) | +| `timeout_seconds` | integer | No | 120 | Maximum seconds to wait for a response | + +**Example:** +``` +ask_genie( + space_id="01f116b25cb61b919a9efa192d5a96e4", + question="How many customers are there?" +) +``` + +### `ask_genie_followup` + +Asks a follow-up question within an existing conversation. Genie uses the prior conversation context to resolve references like "that", "those", or "break it down". + +**Parameters:** + +| Parameter | Type | Required | Default | Description | +|-----------|------|----------|---------|-------------| +| `space_id` | string | Yes | — | The same Genie Space ID used in the original question | +| `conversation_id` | string | Yes | — | The `conversation_id` returned by a previous `ask_genie` or `ask_genie_followup` call | +| `question` | string | Yes | — | The follow-up question | +| `timeout_seconds` | integer | No | 120 | Maximum seconds to wait for a response | + +**Example:** +``` +ask_genie_followup( + space_id="01f116b25cb61b919a9efa192d5a96e4", + conversation_id="01f11a9721091636b4f756a3ef43c3f7", + question="Break that down by region" +) +``` + +## Response Shape + +Both tools return the same response structure. Fields are present or absent depending on the outcome. + +### Successful Query Response + +When Genie generates and executes SQL successfully: + +```json +{ + "question": "How many customers are there?", + "conversation_id": "01f11a9721091636b4f756a3ef43c3f7", + "message_id": "01f11a97212b171cae53a9d478929dd1", + "status": "COMPLETED", + "sql": "SELECT COUNT(DISTINCT `customer_id`) AS num_customers FROM ...", + "description": "You want to know the total number of unique customers in the database.", + "row_count": 1, + "columns": ["num_customers"], + "data": [["3000"]], + "text_response": "There are **3,000 customers** in the dataset." +} +``` + +### Text-Only Response (No SQL Generated) + +When Genie cannot or chooses not to generate SQL (e.g., the question is off-topic or needs clarification): + +```json +{ + "question": "What is the meaning of life?", + "conversation_id": "01f11a974255111a9c5fe447da964eae", + "message_id": "01f11a97426b147ebd038be04db2f3b5", + "status": "COMPLETED", + "text_response": "This question is unrelated to the database schema, so I cannot provide an answer based on the available data." +} +``` + +Note: `status` is still `"COMPLETED"` — check for the presence of `sql` or `data` to distinguish from a successful query. + +### Error Response + +When the request itself fails (invalid IDs, permissions, empty question): + +```json +{ + "question": "How many customers?", + "status": "ERROR", + "error": "You need \"Can View\" permission to perform this action." +} +``` + +### Timeout Response + +When the response exceeds `timeout_seconds`: + +```json +{ + "question": "Complex query...", + "status": "TIMEOUT", + "error": "Genie response timed out after 3s" +} +``` + +### Field Reference + +| Field | Type | Present When | Description | +|-------|------|-------------|-------------| +| `question` | string | Always | The original question asked | +| `conversation_id` | string | Success or text-only | ID for follow-up questions via `ask_genie_followup` | +| `message_id` | string | Success or text-only | Unique identifier for this message | +| `status` | string | Always | `COMPLETED`, `ERROR`, or `TIMEOUT` | +| `sql` | string | Successful query | The SQL query Genie generated | +| `description` | string | Successful query | Genie's interpretation of the question | +| `columns` | list[string] | Successful query | Column names in the result set | +| `data` | list[list[string]] | Successful query | Query results — each row is a list of string values | +| `row_count` | integer | Successful query | Number of rows returned | +| `text_response` | string | Success or text-only | Natural language summary or explanation | +| `error` | string | ERROR or TIMEOUT | Error description | + +**Important notes about `data` values:** +- All values are strings, even for numeric columns. A count of 3000 is returned as `"3000"`, not `3000`. +- Large numbers may use scientific notation (e.g., `"3.336E8"` instead of `"333600000"`). +- The `text_response` field typically contains human-friendly formatted numbers (e.g., "$333,607,274.99") which are easier to present to users. + +## When to Use `ask_genie` vs `execute_sql` + +### Use `ask_genie` When: + +| Scenario | Why | +|----------|-----| +| Genie Space has curated business logic | Genie knows rules like "active customer = ordered in 90 days" | +| User explicitly says "ask Genie" or "use my Genie Space" | User intent to use their curated space | +| Complex business metrics with specific definitions | Genie has certified queries for official metrics | +| Testing a Genie Space after creating it | Validate the space works correctly | +| User wants conversational data exploration | Genie handles context for follow-up questions | + +### Use Direct SQL (`execute_sql`) Instead When: + +| Scenario | Why | +|----------|-----| +| Simple ad-hoc query | Direct SQL is faster, no curation needed | +| You already have the exact SQL | No need for Genie to regenerate | +| No Genie Space exists for this data | Can't use Genie without a space | +| Need precise control over the query | Direct SQL gives exact control | + +## Multi-Turn Conversation Workflow + +### Starting a Conversation + +Every call to `ask_genie` starts a **new** conversation. The response includes a `conversation_id` that you pass to `ask_genie_followup` for subsequent turns. + +``` +# Turn 1: New conversation +result = ask_genie(space_id, "How many customers are there?") +# result["conversation_id"] = "01f11a9721091636b4f756a3ef43c3f7" + +# Turn 2: Follow-up (Genie knows "that" = customers) +result2 = ask_genie_followup(space_id, result["conversation_id"], + "Break that down by region") + +# Turn 3: Another follow-up in the same conversation +result3 = ask_genie_followup(space_id, result["conversation_id"], + "Which region has the highest average income?") +``` + +### When to Start a New Conversation + +Start a **new** conversation (use `ask_genie`) when the topic changes: + +``` +# Topic 1: Customer analysis +r1 = ask_genie(space_id, "How many customers are there?") +r2 = ask_genie_followup(space_id, r1["conversation_id"], + "Break that down by region") + +# Topic 2: Loan analysis — new conversation +r3 = ask_genie(space_id, "What is the total loan portfolio value?") +r4 = ask_genie_followup(space_id, r3["conversation_id"], + "Break that down by loan type") +``` + +Reusing a conversation across unrelated topics may confuse Genie's context resolution. + +## Handling Responses + +### Branching on Response Type + +```python +result = ask_genie(space_id, question) + +if result["status"] == "COMPLETED": + if "sql" in result and result.get("data"): + # Successful query with results + print(f"SQL: {result['sql']}") + print(f"Rows: {result['row_count']}") + for row in result["data"]: + print(row) + elif result.get("text_response"): + # Text-only response (clarification or off-topic) + print(f"Genie says: {result['text_response']}") +elif result["status"] == "TIMEOUT": + print("Query timed out — try increasing timeout_seconds or simplifying the question") +elif result["status"] == "ERROR": + print(f"Error: {result['error']}") +``` + +### Timeout Guidance + +| Query Complexity | Suggested `timeout_seconds` | +|-----------------|---------------------------| +| Simple aggregation (COUNT, SUM) | 30–60 | +| Multi-table joins | 60–120 | +| Large data scans or complex analytics | 120–180 | + +## Error Handling + +### Error Status Values + +| Status | Meaning | Common Causes | +|--------|---------|---------------| +| `COMPLETED` | Request succeeded | Query ran, or Genie responded with text | +| `ERROR` | Request failed | Invalid `space_id`, invalid `conversation_id`, empty question, permission denied | +| `TIMEOUT` | Response exceeded `timeout_seconds` | Complex query, cold warehouse, low timeout value | + +### Verified Error Messages + +| Error | Cause | Fix | +|-------|-------|-----| +| `You need "Can View" permission to perform this action.` | Invalid or inaccessible `space_id` | Verify the space_id with `list_genie` or `get_genie`; check permissions | +| `User does not own conversation .` | Invalid `conversation_id` in `ask_genie_followup` | Use the `conversation_id` from a previous response in the same session | +| `Field 'content' is required, expected non-default value (not "")!` | Empty `question` string | Provide a non-empty question | +| `Genie response timed out after s` | Response exceeded `timeout_seconds` | Increase timeout, simplify question, or check warehouse status | + +## Troubleshooting + +### "You need Can View permission" + +- Verify the `space_id` is correct — use `list_genie()` to see accessible spaces +- Confirm you have at least "Can View" permission on the Genie Space +- Check that the Genie Space hasn't been deleted + +### "User does not own conversation" + +- The `conversation_id` must come from a previous `ask_genie` or `ask_genie_followup` response in the same user session +- Conversation IDs from other users won't work +- If the conversation has expired, start a new one with `ask_genie` + +### Query Timed Out + +- Increase `timeout_seconds` (default is 120) +- Simplify the question — fewer joins and aggregations +- Check if the SQL warehouse is running (a cold start adds 30–60 seconds) +- Try the question in the Genie UI to see if it's inherently slow + +### Genie Returns Text Instead of SQL + +- The question may be off-topic for the configured tables +- Rephrase with more specific terms that match the table/column names +- Check the Genie Space configuration — it may need more tables or instructions +- The `text_response` often explains what Genie needs to answer the question + +### Unexpected or Wrong Results + +- Review the `sql` field in the response to see what Genie generated +- Check the `description` field — it shows how Genie interpreted the question +- Add SQL instructions or certified queries to the Genie Space via the Databricks UI +- Add sample questions that demonstrate correct query patterns diff --git a/databricks-skills/databricks-genie/SKILL.md b/databricks-skills/databricks-genie/SKILL.md index e5b32b6e..7ebd5245 100644 --- a/databricks-skills/databricks-genie/SKILL.md +++ b/databricks-skills/databricks-genie/SKILL.md @@ -33,7 +33,8 @@ Use this skill when: | Tool | Purpose | |------|---------| -| `ask_genie` | Ask a question or follow-up (`conversation_id` optional) | +| `ask_genie` | Ask an initial question to a Genie Space | +| `ask_genie_followup` | Ask a follow-up question in an existing conversation | ### Supporting Tools @@ -95,7 +96,7 @@ ask_genie( ## Reference Files - [spaces.md](spaces.md) - Creating and managing Genie Spaces -- [conversation.md](conversation.md) - Asking questions via the Conversation API +- [2-conversation-api.md](2-conversation-api.md) - Asking questions via the Conversation API ## Prerequisites diff --git a/databricks-skills/databricks-genie/conversation.md b/databricks-skills/databricks-genie/conversation.md deleted file mode 100644 index d3a4676f..00000000 --- a/databricks-skills/databricks-genie/conversation.md +++ /dev/null @@ -1,239 +0,0 @@ -# Genie Conversations - -Use the Genie Conversation API to ask natural language questions to a curated Genie Space. - -## Overview - -The `ask_genie` tool allows you to programmatically send questions to a Genie Space and receive SQL-generated answers. Instead of writing SQL directly, you delegate the query generation to Genie, which has been curated with business logic, instructions, and certified queries. - -## When to Use `ask_genie` - -### Use `ask_genie` When: - -| Scenario | Why | -|----------|-----| -| Genie Space has curated business logic | Genie knows rules like "active customer = ordered in 90 days" | -| User explicitly says "ask Genie" or "use my Genie Space" | User intent to use their curated space | -| Complex business metrics with specific definitions | Genie has certified queries for official metrics | -| Testing a Genie Space after creating it | Validate the space works correctly | -| User wants conversational data exploration | Genie handles context for follow-up questions | - -### Use Direct SQL (`execute_sql`) Instead When: - -| Scenario | Why | -|----------|-----| -| Simple ad-hoc query | Direct SQL is faster, no curation needed | -| You already have the exact SQL | No need for Genie to regenerate | -| Genie Space doesn't exist for this data | Can't use Genie without a space | -| Need precise control over the query | Direct SQL gives exact control | - -## MCP Tools - -| Tool | Purpose | -|------|---------| -| `ask_genie` | Ask a question or follow-up (`conversation_id` optional) | - -## Basic Usage - -### Ask a Question - -```python -ask_genie( - space_id="01abc123...", - question="What were total sales last month?" -) -``` - -**Response:** -```python -{ - "question": "What were total sales last month?", - "conversation_id": "conv_xyz789", - "message_id": "msg_123", - "status": "COMPLETED", - "sql": "SELECT SUM(total_amount) AS total_sales FROM orders WHERE order_date >= DATE_TRUNC('month', CURRENT_DATE - INTERVAL 1 MONTH) AND order_date < DATE_TRUNC('month', CURRENT_DATE)", - "columns": ["total_sales"], - "data": [[125430.50]], - "row_count": 1 -} -``` - -### Ask Follow-up Questions - -Use the `conversation_id` from the first response to ask follow-up questions with context: - -```python -# First question -result = ask_genie( - space_id="01abc123...", - question="What were total sales last month?" -) - -# Follow-up (uses context from first question) -ask_genie( - space_id="01abc123...", - question="Break that down by region", - conversation_id=result["conversation_id"] -) -``` - -Genie remembers the context, so "that" refers to "total sales last month". - -## Response Fields - -| Field | Description | -|-------|-------------| -| `question` | The original question asked | -| `conversation_id` | ID for follow-up questions | -| `message_id` | Unique message identifier | -| `status` | `COMPLETED`, `FAILED`, `CANCELLED`, `TIMEOUT` | -| `sql` | The SQL query Genie generated | -| `columns` | List of column names in result | -| `data` | Query results as list of rows | -| `row_count` | Number of rows returned | -| `text_response` | Text explanation (if Genie asks for clarification) | -| `error` | Error message (if status is not COMPLETED) | - -## Handling Responses - -### Successful Response - -```python -result = ask_genie(space_id, "Who are our top 10 customers?") - -if result["status"] == "COMPLETED": - print(f"SQL: {result['sql']}") - print(f"Rows: {result['row_count']}") - for row in result["data"]: - print(row) -``` - -### Failed Response - -```python -result = ask_genie(space_id, "What is the meaning of life?") - -if result["status"] == "FAILED": - print(f"Error: {result['error']}") - # Genie couldn't answer - may need to rephrase or use direct SQL -``` - -### Timeout - -```python -result = ask_genie(space_id, question, timeout_seconds=60) - -if result["status"] == "TIMEOUT": - print("Query took too long - try a simpler question or increase timeout") -``` - -## Example Workflows - -### Workflow 1: User Asks to Use Genie - -``` -User: "Ask my Sales Genie what the churn rate is" - -Claude: -1. Identifies user wants to use Genie (explicit request) -2. Calls ask_genie(space_id="sales_genie_id", question="What is the churn rate?") -3. Returns: "Based on your Sales Genie, the churn rate is 4.2%. - Genie used this SQL: SELECT ..." -``` - -### Workflow 2: Testing a New Genie Space - -``` -User: "I just created a Genie Space for HR data. Can you test it?" - -Claude: -1. Gets the space_id from the user or recent create_or_update_genie result -2. Calls ask_genie with test questions: - - "How many employees do we have?" - - "What is the average salary by department?" -3. Reports results: "Your HR Genie is working. It correctly answered..." -``` - -### Workflow 3: Data Exploration with Follow-ups - -``` -User: "Use my analytics Genie to explore sales trends" - -Claude: -1. ask_genie(space_id, "What were total sales by month this year?") -2. User: "Which month had the highest growth?" -3. ask_genie(space_id, "Which month had the highest growth?", conversation_id=conv_id) -4. User: "What products drove that growth?" -5. ask_genie(space_id, "What products drove that growth?", conversation_id=conv_id) -``` - -## Best Practices - -### Start New Conversations for New Topics - -Don't reuse conversations across unrelated questions: - -```python -# Good: New conversation for new topic -result1 = ask_genie(space_id, "What were sales last month?") # New conversation -result2 = ask_genie(space_id, "How many employees do we have?") # New conversation - -# Good: Follow-up for related question -result1 = ask_genie(space_id, "What were sales last month?") -result2 = ask_genie(space_id, "Break that down by product", - conversation_id=result1["conversation_id"]) # Related follow-up -``` - -### Handle Clarification Requests - -Genie may ask for clarification instead of returning results: - -```python -result = ask_genie(space_id, "Show me the data") - -if result.get("text_response"): - # Genie is asking for clarification - print(f"Genie asks: {result['text_response']}") - # Rephrase with more specifics -``` - -### Set Appropriate Timeouts - -- Simple aggregations: 30-60 seconds -- Complex joins: 60-120 seconds -- Large data scans: 120+ seconds - -```python -# Quick question -ask_genie(space_id, "How many orders today?", timeout_seconds=30) - -# Complex analysis -ask_genie(space_id, "Calculate customer lifetime value for all customers", - timeout_seconds=180) -``` - -## Troubleshooting - -### "Genie Space not found" - -- Verify the `space_id` is correct -- Check you have access to the space -- Use `get_genie(space_id)` to verify it exists - -### "Query timed out" - -- Increase `timeout_seconds` -- Simplify the question -- Check if the SQL warehouse is running - -### "Failed to generate SQL" - -- Rephrase the question more clearly -- Check if the question is answerable with the available tables -- Add more instructions/curation to the Genie Space - -### Unexpected Results - -- Review the generated SQL in the response -- Add SQL instructions to the Genie Space via the Databricks UI -- Add sample questions that demonstrate correct patterns diff --git a/databricks-skills/install_skills.sh b/databricks-skills/install_skills.sh index 6c1808a3..f58bb46e 100755 --- a/databricks-skills/install_skills.sh +++ b/databricks-skills/install_skills.sh @@ -98,7 +98,7 @@ get_skill_extra_files() { case "$1" in "databricks-agent-bricks") echo "1-knowledge-assistants.md 2-supervisor-agents.md" ;; "databricks-aibi-dashboards") echo "widget-reference.md sql-patterns.md" ;; - "databricks-genie") echo "spaces.md conversation.md" ;; + "databricks-genie") echo "spaces.md 2-conversation-api.md" ;; "databricks-asset-bundles") echo "alerts_guidance.md SDP_guidance.md" ;; "databricks-iceberg") echo "1-managed-iceberg-tables.md 2-uniform-and-compatibility.md 3-iceberg-rest-catalog.md 4-snowflake-interop.md 5-external-engine-interop.md" ;; "databricks-app-apx") echo "backend-patterns.md best-practices.md frontend-patterns.md" ;;