Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new skills-boundary eval suite and a runnable activation/trigger eval for the mongodb-schema-design skill to validate routing decisions and description gaps between schema design vs query optimization.
Changes:
- Added a new boundary test suite (
query-optimizer-vs-schema-design.json) with 40 cases, including ambiguous and “neither” suppression cases plus description-gap probes. - Documented the new boundary suite in
testing/skills-boundaries/README.md. - Added a
mongodb-schema-designtrigger-eval dataset (40 prompts) and a TS runner script to execute it via the Claude CLI.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| testing/skills-boundaries/query-optimizer-vs-schema-design.json | New boundary test suite defining expected routing between query-optimizer vs schema-design, including “neither” and description-gap cases. |
| testing/skills-boundaries/README.md | Documents the new boundary suite and its case breakdown. |
| testing/mongodb-schema-design/trigger-eval.json | New activation eval dataset for schema-design skill triggering. |
| testing/mongodb-schema-design/run-trigger-eval.ts | New CLI-based runner to execute trigger-eval prompts and report pass/fail. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
I agree that the last 2 are ambiguous, they fall exactly in the space where we might need either of the two skills. As actually ensuring that both are considered is something that's out of scope now, I've created this ticket to at least collect examples, as this is a first step we've agreed on: https://jira.mongodb.org/browse/MCP-477 |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
MCP-474
Adds an activation eval, with 40 tests, and a script to run it.
Adds a skill boundary for query-optimizer vs schema-design.
I ran the script with the
trigger-eval.jsonin themongodb-natural-language-queryingskill and got a 20/20 result.Example run from the script:
The 2 false negatives from the run are a bit ambiguous, I think the query-optimizer could do a fine job on those potentially, although it is part of the skill boundary.
I've run the skill boundaries eval 3 times, it gets 100% on everything except the description gaps. The description gaps vary in each, all above 50% though.