Skip to content

feat(natural-language-querying): reduce Compass specific instructions and context size MCP-475#35

Merged
Anemy merged 5 commits intomainfrom
MCP-475-remove-compass-specific-items-from-natural-language-querying
Apr 29, 2026
Merged

feat(natural-language-querying): reduce Compass specific instructions and context size MCP-475#35
Anemy merged 5 commits intomainfrom
MCP-475-remove-compass-specific-items-from-natural-language-querying

Conversation

@Anemy
Copy link
Copy Markdown
Member

@Anemy Anemy commented Apr 23, 2026

MCP-475

This skill was adapted from Compass' natural language prompts. As a result, it contains instructions specific to how the LLM output is used in Compass. This pr should improve the skill's quality by reducing instruction ambiguity, reducing the size of the required context overall, and removing the Compass specific expected output format to allow for more generalized agentic uses. There are likely more improvements to do here, this is mostly aimed at low hanging fruit.

This pr also updates the evals to reduce ambiguity. They were also adopted from Compass and expected to be run against either an explicit aggregation or find generator, not one that does both. Added a test for Java and a test that checks that a find with a maxTimeMS can be created.

The evals run below are given with the prompt You are a MongoDB assistant. and the user's prompt. Ping me if you'd like the full log and judge notes. This will lead have bias in the no skill vs skill results - oftentimes without the skill it will deduce the result and not show the query ran to the user. It should give an indicator if these changes have any impact.

Eval results (before)

📊 SUMMARY

# Eval Name With Skill Without Skill Delta
1 simple-find 70% 75% -5%
2 geo-based-find 100% 40% +60%
3 find-with-nested-match 85% 10% +75%
4 find-translates-to-agg-mode-count 100% 20% +80%
5 find-translates-to-agg-total-sum 100% 100% 0%
6 find-translates-to-agg-max-host 100% 80% +20%
7 relative-date-find-last-year 90% 10% +80%
8 relative-date-find-30-years-ago 85% 10% +75%
9 number-field-find 100% 90% +10%
10 find-with-complex-projection 100% 30% +70%
11 find-with-and-operator 100% 40% +60%
12 find-with-non-english 85% 65% +20%
13 find-with-regex-string-ops 95% 100% -5%
14 find-simple-projection 90% 0% +90%
15 basic-aggregate 90% 0% +90%
16 agg-filter-and-projection 30% 0% +30%
17 geo-based-agg 100% 100% 0%
18 agg-nested-fields-match 10% 10% 0%
19 agg-group-sort-limit-project 70% 10% +60%
20 agg-group-sort-limit-project-2 30% 20% +10%
21 relative-date-agg-30-years 100% 85% +15%
22 relative-date-agg-last-year 90% 70% +20%
23 agg-array-slice 90% 0% +90%
24 agg-multiple-conditions-match 100% 100% 0%
25 agg-non-english 100% 95% +5%
26 agg-simple-sort-limit 100% 100% 0%
27 agg-unwind-group 100% 100% 0%
28 agg-size-operator 85% 75% +10%
29 agg-complex-word-frequency 95% 82% +13%
30 agg-super-complex-percentage 100% 90% +10%
31 agg-complex-regex-string-ops 100% 100% 0%
32 agg-join-lookup 100% 100% 0%
33 agg-simple-projection 85% 10% +75%
34 no-redundant-exists-with-comparison 90% 0% +90%
35 max-time-ms-option-used 90% 20% +70%
36 find-in-java 100% 100% 0%
Metric With Skill Without Skill Delta
Avg Score 87.6% 53.8% 33.8%
Total Cost $2.7623 $2.9056 $-0.1433
Total Time 1023.1s 1260.9s -237.8s

Total cost (including judging): $6.8234

Skill wins: 25 | Losses: 2 | Ties: 9

Eval results (with changes)

📊 SUMMARY

# Eval Name With Skill Without Skill Delta
1 simple-find 100% 50% +50%
2 geo-based-find 90% 90% 0%
3 find-with-nested-match 90% 10% +80%
4 find-translates-to-agg-mode-count 100% 10% +90%
5 find-translates-to-agg-total-sum 100% 40% +60%
6 find-translates-to-agg-max-host 100% 60% +40%
7 relative-date-find-last-year 70% 10% +60%
8 relative-date-find-30-years-ago 0% 10% -10%
9 number-field-find 100% 100% 0%
10 find-with-complex-projection 100% 85% +15%
11 find-with-and-operator 100% 20% +80%
12 find-with-non-english 100% 100% 0%
13 find-with-regex-string-ops 85% 85% 0%
14 find-simple-projection 100% 0% +100%
15 basic-aggregate 90% 0% +90%
16 agg-filter-and-projection 30% 0% +30%
17 geo-based-agg 100% 100% 0%
18 agg-nested-fields-match 85% 50% +35%
19 agg-group-sort-limit-project 70% 10% +60%
20 agg-group-sort-limit-project-2 100% 20% +80%
21 relative-date-agg-30-years 95% 100% -5%
22 relative-date-agg-last-year 85% 0% +85%
23 agg-array-slice 100% 10% +90%
24 agg-multiple-conditions-match 100% 100% 0%
25 agg-non-english 100% 100% 0%
26 agg-simple-sort-limit 100% 100% 0%
27 agg-unwind-group 100% 100% 0%
28 agg-size-operator 100% 90% +10%
29 agg-complex-word-frequency 95% 30% +65%
30 agg-super-complex-percentage 100% 85% +15%
31 agg-complex-regex-string-ops 100% 100% 0%
32 agg-join-lookup 100% 100% 0%
33 agg-simple-projection 90% 0% +90%
34 no-redundant-exists-with-comparison 90% 0% +90%
35 max-time-ms-option-used 85% 30% +55%
36 find-in-java 100% 100% 0%
Metric With Skill Without Skill Delta
Avg Score 90.3% 52.6% 37.6%
Total Cost $2.8551 $2.5615 $0.2937
Total Time 1395.2s 1007.6s 387.6s

Total cost (including judging): $6.4988

Skill wins: 22 | Losses: 2 | Ties: 12

@Anemy Anemy requested review from Copilot and paula-stacho April 23, 2026 03:13
@Anemy Anemy requested a review from a team as a code owner April 23, 2026 03:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the mongodb-natural-language-querying skill prompt to be less Compass-specific and to reduce ambiguity/context requirements so it can be used more generally across agentic workflows.

Changes:

  • Removes Compass-specific framing and the prior Compass-style JSON/stringified-query output format.
  • Updates response-format guidance to prefer the “workspace language” (or default to MongoDB shell syntax).
  • Tweaks wording in best practices and replaces the “Size Limits” section with “Managing Context Size”.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/mongodb-natural-language-querying/SKILL.md Outdated
Comment thread skills/mongodb-natural-language-querying/SKILL.md Outdated
Comment thread skills/mongodb-natural-language-querying/SKILL.md Outdated
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 23, 2026 03:54
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the mongodb-natural-language-querying skill prompt to be less Compass-specific by simplifying instructions, reducing required context, and switching examples away from a Compass JSON wrapper toward direct MongoDB shell/driver query output.

Changes:

  • Removes Compass-specific output format and shows shell-style find() / aggregate() examples.
  • Trims/rewrites guidance to reduce ambiguity and context size overhead.
  • Minor wording improvements in best-practices guidance.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread skills/mongodb-natural-language-querying/SKILL.md
Comment thread skills/mongodb-natural-language-querying/SKILL.md
Comment thread skills/mongodb-natural-language-querying/SKILL.md Outdated
Comment thread skills/mongodb-natural-language-querying/SKILL.md Outdated
@Anemy Anemy requested review from a team as code owners April 23, 2026 12:16
Copy link
Copy Markdown
Collaborator

@dacharyc dacharyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, @Anemy - just a handful of questions/suggestions here for your consideration.

"limit": "10"
}
}
```js
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can omit the mongosh syntax here, that would be ideal. In my testing, I've observed agents conflating mongosh APIs with Node.js APIs, and occasional API contamination across programming languages due to Programming Language Confusion.

Copy link
Copy Markdown
Member Author

@Anemy Anemy Apr 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about removing the examples altogether? I feel the way they were before, combined with the instructions that said to make them a MongoDB shell/Extended JSON syntax in a json object would lead to more programming language and output confusion. My assumption is folks are asking these questions with some context for the agent like an existing workspace with a language. If it's a totally isolated prompt without that context I think we would want to default to mongosh.

Comment thread skills/mongodb-natural-language-querying/SKILL.md Outdated
Comment thread skills/mongodb-natural-language-querying/SKILL.md Outdated
Comment thread skills/mongodb-natural-language-querying/SKILL.md
"id": 15,
"name": "basic-aggregate",
"prompt": "find all the movies released in 1983",
"prompt": "aggregate all the movies released in 1983",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious about the change from "find" to "aggregate" here, and the more specific requests below. Are you optimizing for test success? I might argue that "find all the movies" is closer to a natural language query than "aggregate" or quoting specific pipeline stages, and represent a better test of how well the skill is guiding agents to form correct queries from natural language.

I would suggest if we do want to test the more refined language specifically, we might want to do that in addition to keeping the more natural-language style, instead of replacing it, so we have a better idea of how the skill performs with varying levels of specificity.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When running the evals more strictly a lot of the evals expecting an aggregation would fail as it would generate a find. The prompt could be fulfilled with a find/project etc. In the skill we have written

Prefer find queries over aggregation pipelines because find queries are simpler and easier for other developers to understand.

We already have this same eval for a basic find that has the same user prompt, so it was a duplicate test before. When making this change I was thinking it would reduce the duplication by giving it a different prompt. Now thinking about it more I'm not sure it's doing anything more than the first prompt. I agree it's likely not something someone would ask. I'm leaning towards removing it altogether.

It looks like these tests were adopted from Compass where they are expected to be run against either an explicit aggregation or find generator, not one that does both. The tests in Compass are also a bit more deterministic, with a stricter output. It would be nice to have that expectation out of these evals, but that would be some additional work.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Ok, why don't we put a pin in this one for now. The skills working group has been having conversations about spinning up more formal eval tooling, so maybe we can revisit this once we have more robust tooling in place.

Copilot AI review requested due to automatic review settings April 25, 2026 01:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to generalize the MongoDB natural-language-querying skill by removing Compass-specific assumptions, reducing context requirements, and updating the eval suite to better match the intended “find or aggregate” behavior (including new coverage for maxTimeMS and Java driver output).

Changes:

  • Simplified/adjusted SKILL.md guidance around required context, find vs. aggregate selection, output formatting, and context size management.
  • Updated eval prompts/expected outputs to reduce ambiguity and allow dynamic date handling.
  • Added new eval cases for maxTimeMS usage in find queries and for Java driver syntax.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
testing/mongodb-natural-language-querying/evals/evals.json Refines eval prompts/expected outputs and adds new evals (maxTimeMS + Java).
skills/mongodb-natural-language-querying/SKILL.md Removes Compass-specific guidance and revises output/context instructions for broader agentic use.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread testing/mongodb-natural-language-querying/evals/evals.json
Comment on lines +63 to 66
Output queries using the user-requested language or driver syntax; if no language or expected format is supplied, always use MongoDB shell syntax (with unquoted keys and single quotes) for readability and compatibility with MongoDB tools.

**Find Query Response:**
```json
Comment thread testing/mongodb-natural-language-querying/evals/evals.json
Comment thread testing/mongodb-natural-language-querying/evals/evals.json
Comment thread testing/mongodb-natural-language-querying/evals/evals.json
@codeowners-service-app
Copy link
Copy Markdown

Assigned alenakhineika for team dbx-devtools because paula-stacho is out of office.

Copy link
Copy Markdown
Collaborator

@dacharyc dacharyc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

Comment thread skills/mongodb-natural-language-querying/SKILL.md
"id": 15,
"name": "basic-aggregate",
"prompt": "find all the movies released in 1983",
"prompt": "aggregate all the movies released in 1983",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Ok, why don't we put a pin in this one for now. The skills working group has been having conversations about spinning up more formal eval tooling, so maybe we can revisit this once we have more robust tooling in place.

@Anemy Anemy merged commit 160e31e into main Apr 29, 2026
4 checks passed
@Anemy Anemy deleted the MCP-475-remove-compass-specific-items-from-natural-language-querying branch April 29, 2026 14:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants