Add markdown output format for execute_sql to reduce token usage by ~50% by ericseastrand-ln · Pull Request #297 · databricks-solutions/ai-dev-kit

ericseastrand-ln · 2026-03-12T06:20:17Z

Summary

Adds output_format parameter to execute_sql and execute_sql_multi MCP tools
Default is "markdown" — returns a markdown table string instead of JSON array-of-objects
output_format="json" preserves existing behavior for backwards compatibility
Includes 7 unit tests for the markdown formatter

Closes #296

Problem

execute_sql returns results as a JSON array of objects, repeating every column name on every row:

[
  {"event_id": "EVT-10001", "event_name": "Concert A", "venue": "Arena 1", "city": "NYC"},
  {"event_id": "EVT-10002", "event_name": "Concert B", "venue": "Arena 2", "city": "Chicago"}
]

Since MCP tools are consumed by LLMs, this is extremely wasteful — ~42% of the payload is redundant column names. In a real session with 107 SQL queries, this caused 4 context compaction events and burned ~64M tokens.

Solution

Return a markdown table by default:

| event_id | event_name | venue | city |
| --- | --- | --- | --- |
| EVT-10001 | Concert A | Arena 1 | NYC |
| EVT-10002 | Concert B | Arena 2 | Chicago |

(2 rows)

For a 100-row × 10-column result set:

Format	Size	Savings
JSON (current)	27,380 chars	—
Markdown (new default)	13,860 chars	50%

Changes

databricks-mcp-server/databricks_mcp_server/tools/sql.py — adds _format_results_markdown() helper and output_format parameter to both execute_sql and execute_sql_multi
databricks-mcp-server/tests/test_sql_output_format.py — 7 unit tests covering empty results, single/multiple rows, None handling, pipe escaping, column-name-once guarantee, and size comparison

No changes to databricks-tools-core — formatting is applied at the MCP server layer only.

Test plan

All 7 new tests pass (pytest tests/test_sql_output_format.py)
Existing passing tests unaffected
Manual test: run execute_sql with default format and verify markdown output
Manual test: run execute_sql with output_format="json" and verify JSON output

SQL results are consumed by LLMs via MCP, but the JSON array-of-objects format repeats every column name on every row — wasting ~42% of the payload on redundant keys. For a 100-row × 10-column result, JSON produces ~27K chars vs ~14K for a markdown table. This adds an `output_format` parameter (default: "markdown") to `execute_sql` and `execute_sql_multi`. Markdown tables state column names once in the header, which LLMs parse natively. Use `output_format="json"` for backwards compatibility. Closes databricks-solutions#296

calreynolds

Thank you! Great PR 👍

Co-authored-by: Isaac

calreynolds self-requested a review March 16, 2026 19:52

calreynolds approved these changes Mar 16, 2026

View reviewed changes

calreynolds merged commit 22f6907 into databricks-solutions:main Mar 16, 2026
1 of 2 checks passed

calreynolds added a commit that referenced this pull request Mar 16, 2026

style: format test_sql_output_format.py from merged PR #297

53ed7bd

Co-authored-by: Isaac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add markdown output format for execute_sql to reduce token usage by ~50%#297

Add markdown output format for execute_sql to reduce token usage by ~50%#297
calreynolds merged 1 commit intodatabricks-solutions:mainfrom
ericseastrand-ln:feature/markdown-sql-output

ericseastrand-ln commented Mar 12, 2026 •

edited

Loading

Uh oh!

calreynolds left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ericseastrand-ln commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Changes

Test plan

Uh oh!

calreynolds left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ericseastrand-ln commented Mar 12, 2026 •

edited

Loading