Skip to content

Add markdown output format for execute_sql to reduce token usage by ~50%#297

Merged
calreynolds merged 1 commit intodatabricks-solutions:mainfrom
ericseastrand-ln:feature/markdown-sql-output
Mar 16, 2026
Merged

Add markdown output format for execute_sql to reduce token usage by ~50%#297
calreynolds merged 1 commit intodatabricks-solutions:mainfrom
ericseastrand-ln:feature/markdown-sql-output

Conversation

@ericseastrand-ln
Copy link
Contributor

@ericseastrand-ln ericseastrand-ln commented Mar 12, 2026

Summary

  • Adds output_format parameter to execute_sql and execute_sql_multi MCP tools
  • Default is "markdown" — returns a markdown table string instead of JSON array-of-objects
  • output_format="json" preserves existing behavior for backwards compatibility
  • Includes 7 unit tests for the markdown formatter

Closes #296

Problem

execute_sql returns results as a JSON array of objects, repeating every column name on every row:

[
  {"event_id": "EVT-10001", "event_name": "Concert A", "venue": "Arena 1", "city": "NYC"},
  {"event_id": "EVT-10002", "event_name": "Concert B", "venue": "Arena 2", "city": "Chicago"}
]

Since MCP tools are consumed by LLMs, this is extremely wasteful — ~42% of the payload is redundant column names. In a real session with 107 SQL queries, this caused 4 context compaction events and burned ~64M tokens.

Solution

Return a markdown table by default:

| event_id | event_name | venue | city |
| --- | --- | --- | --- |
| EVT-10001 | Concert A | Arena 1 | NYC |
| EVT-10002 | Concert B | Arena 2 | Chicago |

(2 rows)

For a 100-row × 10-column result set:

Format Size Savings
JSON (current) 27,380 chars
Markdown (new default) 13,860 chars 50%

Changes

  • databricks-mcp-server/databricks_mcp_server/tools/sql.py — adds _format_results_markdown() helper and output_format parameter to both execute_sql and execute_sql_multi
  • databricks-mcp-server/tests/test_sql_output_format.py — 7 unit tests covering empty results, single/multiple rows, None handling, pipe escaping, column-name-once guarantee, and size comparison

No changes to databricks-tools-core — formatting is applied at the MCP server layer only.

Test plan

  • All 7 new tests pass (pytest tests/test_sql_output_format.py)
  • Existing passing tests unaffected
  • Manual test: run execute_sql with default format and verify markdown output
  • Manual test: run execute_sql with output_format="json" and verify JSON output

SQL results are consumed by LLMs via MCP, but the JSON array-of-objects
format repeats every column name on every row — wasting ~42% of the
payload on redundant keys. For a 100-row × 10-column result, JSON
produces ~27K chars vs ~14K for a markdown table.

This adds an `output_format` parameter (default: "markdown") to
`execute_sql` and `execute_sql_multi`. Markdown tables state column
names once in the header, which LLMs parse natively. Use
`output_format="json"` for backwards compatibility.

Closes databricks-solutions#296
@calreynolds calreynolds self-requested a review March 16, 2026 19:52
Copy link
Collaborator

@calreynolds calreynolds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Great PR 👍

@calreynolds calreynolds merged commit 22f6907 into databricks-solutions:main Mar 16, 2026
1 of 2 checks passed
calreynolds added a commit that referenced this pull request Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

execute_sql JSON output repeats column names on every row, wastes ~50% of tokens

2 participants