feat(verification): executable SymPy verification oracle — 12 LaTeX-aware physics checks#246
feat(verification): executable SymPy verification oracle — 12 LaTeX-aware physics checks#246alexandergagliano wants to merge 11 commits into
Conversation
The limiting_case_check and symmetry_check MCP tools previously returned
only structural markers ("documented" / "requires_verification") and told
the calling agent to "use SymPy" — delegating the actual verdict to the
LLM's weakest skill (symbolic algebra). This makes those checks compute.
- Add a lazy-imported CAS backend (gpd/mcp/servers/_cas.py):
* safe_parse: hardened parser (unsafe-token screen, length cap, identifiers
forced to plain Symbols so `gamma`/`E`/`I` aren't reinterpreted as SymPy
specials, ^->** and infinity->oo normalization), RHS of equations.
* check_limit: computes sympy.limit(expr, var, point) for "var -> point"
limits and compares to the expected result -> pass/fail/computed.
* check_symmetry: executes parity (x->-x) / time-reversal (t->-t) by
substitution and classifies even/odd/neither.
* Conservative by construction: any parse failure, timeout, or undecidable
comparison downgrades to `inconclusive`, never `pass`. SymPy runs in a
daemon worker thread with a timeout.
- Wire results additively into verification_server: each result gains a `cas`
block and the response gains `cas_executed` + `overall_cas_verdict` (a single
FAIL dominates). The legacy `status`/`matched_type`/`strategy` fields are
preserved, so non-computable inputs (prose limits, gauge/Lorentz symmetries)
fall back to the prior structural behavior.
- Add `sympy>=1.13` as a core dependency.
- 9 new tests covering pass/fail/inconclusive, infinity limits, unsafe-input
rejection, prose-limit fallback, and parity classification. Full
verification suite (285 tests) green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Physics derivations are written in LaTeX, so the limiting-case and symmetry
checks now parse LaTeX input, widening when the oracle actually fires.
- _cas.safe_parse routes LaTeX (detected by backslash-commands / ^{ / _{) to
SymPy's grammar-based lark backend (no code execution, unlike sympify).
- Rewrite the handful of lark-unsupported-but-common macros (\hbar, \Omega,
\sigma, \ell, and the constant \pi) to collision-free placeholder macros and
substitute them back to the intended Symbol/constant after parsing.
- parse_limit_target understands LaTeX limit keys: \to / \rightarrow arrows and
a leading backslash on the variable (e.g. r"\hbar \to 0", r"c \to \infty").
- Refuse dangerous TeX (\input, \def, \write, \begin, ...) outright; any
still-unparseable LaTeX (e.g. juxtaposed superscript products a^2 b^2, which
the lark grammar rejects) returns None → inconclusive, never a false pass.
- Add lark>=1.1 as a dependency.
Verified end-to-end: classical (hbar -> 0) limit of a LaTeX expression yields a
real pass on a correct claim and fail on a wrong one; parity of a LaTeX
potential classifies even/odd; hard LaTeX degrades to inconclusive. 5 new tests;
full mcp server suite (387 tests) green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…jects
lark cannot parse juxtaposed superscripted products (a^2 b^2), which are
ubiquitous in physics (e.g. kinetic energy hbar^2 k^2 / 2m). When lark fails,
_parse_latex now falls back to a conservative delatexify pass:
- Brace-matched rewrites: \frac{A}{B} -> ((A)/(B)), \sqrt{A} -> sqrt((A)),
\sqrt[n]{A} -> ((A)**(1/(n))), ^{...} -> **(...), a_{bc} -> a_bc, residual
braces -> parens, ^ -> **, and macro-backslash stripping (\omega -> omega).
- The plain string is parsed with implicit multiplication/application
(2m -> 2*m, sin x -> sin(x)) via a shared _parse_plain helper.
- Differential operators (\partial, \nabla, \frac{d ...}) are refused outright
so the fallback can never fabricate algebra from a derivative; any leftover
brace/backslash also returns None. Unparseable -> inconclusive, never a pass.
Now parses full Hamiltonians, Coulomb terms, sqrt forms, etc.; kinetic-energy
classical-limit check returns a real pass. Updated the prior "degrades safely"
test (now rescued) and added a derivative-refusal test. CAS suite green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
dimensional_check previously only worked on hand-annotated [M][L][T] bracket
tokens, so it could not check a real equation like "E = m c^2". Add an optional
`dimensions` map (symbol → named quantity or bracket spec) that drives a
symbolic dimensional analysis via the CAS.
- _cas.check_dimensions / dimension_vector: parse each side (LaTeX or plain with
implicit multiplication), substitute base-dimension exponents per symbol, and
compare. Handles products, rational powers, and dimensionless-argument
functions; a sum mixing incompatible dimensions is a FAIL.
- verification_server: _NAMED_DIMENSIONS table (energy/velocity/force/... over
{M,L,T,Q,Theta}) + _spec_to_dimvec (named or [M][L][T] spec). The tool gains
an optional `dimensions` param; the bracket-annotation path is unchanged.
Each symbolically-checked result gets a `cas` block; response gains
cas_executed + overall_cas_verdict.
- Unknown symbols / unparseable sides → inconclusive, never a false pass.
Verified: E=m c^2, F=m a, E=hbar omega, LaTeX E=1/2 m v^2 → pass; E=m c and
mixed-dim sum x=a+t → fail; missing symbol → inconclusive. Bracket path and all
prior behavior intact. 6 new tests; full mcp server suite (562) green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Alex Gagliano seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
📝 WalkthroughWalkthroughThis PR adds SymPy and Lark dependencies, introduces a new ChangesCAS-Backed Symbolic Verification Integration
Sequence Diagram(s)sequenceDiagram
participant Client
participant VerificationServer
participant CASModule
Client->>VerificationServer: submit check request (dimensional/limit/symmetry or CAS tool)
VerificationServer->>CASModule: call check_* (safe_parse, run_with_timeout)
CASModule-->>VerificationServer: cas verdict & details
VerificationServer-->>Client: aggregated result with cas block
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/gpd/mcp/servers/_cas.py (1)
148-151: 💤 Low valueConsider making const fixup handling extensible.
The current code hardcodes
sympy.pifor anyname=Noneentry in_LATEX_CONST_FIXUPS. If additional constants are added later (e.g.,\efor Euler's number), they would incorrectly map to pi. Consider changing the data structure to map macros directly to their SymPy constants:_LATEX_CONST_FIXUPS = {r"\pi": "pi"} # value is sympy constant nameThen retrieve via
getattr(sympy, name).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/gpd/mcp/servers/_cas.py` around lines 148 - 151, The code currently hardcodes sympy.pi when name is None in the block using placeholder/cleaned/substitutions, which will mis-map any future constants; update _LATEX_CONST_FIXUPS to map LaTeX macros to SymPy names (e.g. {r"\\pi": "pi", r"\\e": "E"}) and change the logic in the clause that builds substitutions (the code using placeholder, cleaned, target and substitutions) to look up the mapped string and call getattr(sympy, mapped_name) for known SymPy constants, falling back to sympy.Symbol(mapped_name) or sympy.Symbol(placeholder.lstrip("\\")) when no mapping exists, removing the hardcoded sympy.pi branch so new constants resolve correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/gpd/mcp/servers/verification_server.py`:
- Around line 2161-2173: _spec_to_dimvec currently treats any string containing
"[" as a bracket spec even when tokens are unrecognized; change it so that when
"[" is present you first validate the entire string matches the expected
bracket-token grammar (only bracketed dimension symbols optionally with
exponent, repeated with no extra text) and reject (return None) if the raw text
contains unknown/unmatched tokens; then call _parse_dimensions and also verify
that every bracket token found in the original text produced a corresponding
entry in the parsed dict (i.e., if tokens were present but parsing yielded
no/partial results, return None). Use the functions _spec_to_dimvec and
_parse_dimensions and the raw input text to implement this full-match +
token-to-parse consistency check.
- Around line 5140-5148: The reducer _aggregate_cas_verdict currently returns
PASS if any child is PASS, which lets a passing item mask INCONCLUSIVE or
COMPUTED; change the precedence so FAIL still dominates, but INCONCLUSIVE takes
priority over COMPUTED and PASS (i.e., check for _cas.VERDICT_FAIL first, then
_cas.VERDICT_INCONCLUSIVE, then _cas.VERDICT_COMPUTED, then _cas.VERDICT_PASS)
and keep the default return as _cas.VERDICT_INCONCLUSIVE to avoid false passes.
---
Nitpick comments:
In `@src/gpd/mcp/servers/_cas.py`:
- Around line 148-151: The code currently hardcodes sympy.pi when name is None
in the block using placeholder/cleaned/substitutions, which will mis-map any
future constants; update _LATEX_CONST_FIXUPS to map LaTeX macros to SymPy names
(e.g. {r"\\pi": "pi", r"\\e": "E"}) and change the logic in the clause that
builds substitutions (the code using placeholder, cleaned, target and
substitutions) to look up the mapped string and call getattr(sympy, mapped_name)
for known SymPy constants, falling back to sympy.Symbol(mapped_name) or
sympy.Symbol(placeholder.lstrip("\\")) when no mapping exists, removing the
hardcoded sympy.pi branch so new constants resolve correctly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 90f3b9e9-037a-40c6-8e88-137abcc1e893
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
pyproject.tomlsrc/gpd/mcp/servers/_cas.pysrc/gpd/mcp/servers/verification_server.pytests/mcp/test_servers.py
| def _spec_to_dimvec(spec: str) -> dict[str, int] | None: | ||
| """Resolve a dimension spec to base-exponents, or None if unrecognized. | ||
|
|
||
| Accepts a named quantity ("energy", "velocity", ...) or a bracket form | ||
| ("[M][L]^2[T]^-2"). Returns {} for an explicitly dimensionless spec. | ||
| """ | ||
| text = spec.strip() | ||
| named = _NAMED_DIMENSIONS.get(text.lower()) | ||
| if named is not None: | ||
| return dict(named) | ||
| if "[" in text: | ||
| return {k: v for k, v in _parse_dimensions(text).items() if v != 0} | ||
| return None |
There was a problem hiding this comment.
Reject partial bracket-spec parses here.
Line 2171 accepts any string containing [ as a bracket-form spec, but _parse_dimensions() ignores unknown tokens. Inputs like "[M][X]", "[foo]", or "mass [L]" therefore collapse to a partial vector or {} instead of being rejected, which can let dimensional_check(..., dimensions=...) report a false pass.
Suggested fix
def _spec_to_dimvec(spec: str) -> dict[str, int] | None:
"""Resolve a dimension spec to base-exponents, or None if unrecognized.
@@
text = spec.strip()
named = _NAMED_DIMENSIONS.get(text.lower())
if named is not None:
return dict(named)
if "[" in text:
- return {k: v for k, v in _parse_dimensions(text).items() if v != 0}
+ normalized = re.sub(r"\s+", "", text)
+ if re.fullmatch(r"(?:\[(?:M|L|T|Q|Theta)\](?:\^[+-]?\d+)?)+", normalized) is None:
+ return None
+ return {k: v for k, v in _parse_dimensions(normalized).items() if v != 0}
return None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/mcp/servers/verification_server.py` around lines 2161 - 2173,
_spec_to_dimvec currently treats any string containing "[" as a bracket spec
even when tokens are unrecognized; change it so that when "[" is present you
first validate the entire string matches the expected bracket-token grammar
(only bracketed dimension symbols optionally with exponent, repeated with no
extra text) and reject (return None) if the raw text contains unknown/unmatched
tokens; then call _parse_dimensions and also verify that every bracket token
found in the original text produced a corresponding entry in the parsed dict
(i.e., if tokens were present but parsing yielded no/partial results, return
None). Use the functions _spec_to_dimvec and _parse_dimensions and the raw input
text to implement this full-match + token-to-parse consistency check.
| def _aggregate_cas_verdict(verdicts: list[str]) -> str: | ||
| """Collapse per-item CAS verdicts into one. A single FAIL dominates.""" | ||
| if _cas.VERDICT_FAIL in verdicts: | ||
| return _cas.VERDICT_FAIL | ||
| if _cas.VERDICT_PASS in verdicts: | ||
| return _cas.VERDICT_PASS | ||
| if _cas.VERDICT_COMPUTED in verdicts: | ||
| return _cas.VERDICT_COMPUTED | ||
| return _cas.VERDICT_INCONCLUSIVE |
There was a problem hiding this comment.
Don't let one passing CAS item mask inconclusive ones.
This reducer currently returns pass as soon as any child verdict passes. A mixed result like ["pass", "inconclusive"] or ["pass", "computed"] will therefore surface an overall pass, which overstates the outcome and breaks the “never false pass” contract for the new CAS metadata.
Suggested fix
def _aggregate_cas_verdict(verdicts: list[str]) -> str:
"""Collapse per-item CAS verdicts into one. A single FAIL dominates."""
if _cas.VERDICT_FAIL in verdicts:
return _cas.VERDICT_FAIL
- if _cas.VERDICT_PASS in verdicts:
- return _cas.VERDICT_PASS
+ if not verdicts or _cas.VERDICT_INCONCLUSIVE in verdicts:
+ return _cas.VERDICT_INCONCLUSIVE
if _cas.VERDICT_COMPUTED in verdicts:
return _cas.VERDICT_COMPUTED
- return _cas.VERDICT_INCONCLUSIVE
+ return _cas.VERDICT_PASS📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _aggregate_cas_verdict(verdicts: list[str]) -> str: | |
| """Collapse per-item CAS verdicts into one. A single FAIL dominates.""" | |
| if _cas.VERDICT_FAIL in verdicts: | |
| return _cas.VERDICT_FAIL | |
| if _cas.VERDICT_PASS in verdicts: | |
| return _cas.VERDICT_PASS | |
| if _cas.VERDICT_COMPUTED in verdicts: | |
| return _cas.VERDICT_COMPUTED | |
| return _cas.VERDICT_INCONCLUSIVE | |
| def _aggregate_cas_verdict(verdicts: list[str]) -> str: | |
| """Collapse per-item CAS verdicts into one. A single FAIL dominates.""" | |
| if _cas.VERDICT_FAIL in verdicts: | |
| return _cas.VERDICT_FAIL | |
| if not verdicts or _cas.VERDICT_INCONCLUSIVE in verdicts: | |
| return _cas.VERDICT_INCONCLUSIVE | |
| if _cas.VERDICT_COMPUTED in verdicts: | |
| return _cas.VERDICT_COMPUTED | |
| return _cas.VERDICT_PASS |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/mcp/servers/verification_server.py` around lines 5140 - 5148, The
reducer _aggregate_cas_verdict currently returns PASS if any child is PASS,
which lets a passing item mask INCONCLUSIVE or COMPUTED; change the precedence
so FAIL still dominates, but INCONCLUSIVE takes priority over COMPUTED and PASS
(i.e., check for _cas.VERDICT_FAIL first, then _cas.VERDICT_INCONCLUSIVE, then
_cas.VERDICT_COMPUTED, then _cas.VERDICT_PASS) and keep the default return as
_cas.VERDICT_INCONCLUSIVE to avoid false passes.
…acle sources The CAS verification oracle adds src/gpd/mcp/servers/_cas.py, bumping the src/gpd/mcp/servers/*.py node count from 15 to 16. Regenerate the repo-graph contract artifacts (tests/repo_graph_contract.json, tests/README.md) so the generated-artifact CI check passes, and apply ruff formatting to the CAS oracle and verification server sources.
…urfaces The CAS oracle adds sympy and lark as runtime dependencies. Add them to the curated runtime-dependency set in tests/test_release_consistency.py and to the pyproject dependency-label graph in tests/README.md so the dependency-surface, wheel-metadata, and dependency-graph-docs release tests pass. Complements the repo-graph regeneration in the preceding commit. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Conservation of energy/momentum/charge is the most common dynamics error class (a quantity that silently drifts), and it had no executable verification surface — only an oracle_hint string. Add a real tool that computes it. - conservation_check(quantity, trajectory, tolerance) evaluates a conserved quantity (plain or LaTeX, via the _cas parser) at every state of a trajectory and measures drift: relative drift vs tolerance, or absolute drift when the quantity is ~0 throughout (e.g. total momentum = 0). Returns pass/fail plus min/max/initial/final values and the drift basis. - Conservative: <2 states, unparseable quantity, missing variables, or non-real values → inconclusive; malformed trajectory/tolerance → stable error envelope. Never a false pass. - Registered read-only; added to the gpd-verification descriptor capabilities (builtin_servers.py + infra/gpd-verification.json) and the contract-visibility expectations. Verified: SHO energy on the unit circle → pass (zero drift); a drifting integrator → fail; zero total momentum → pass (absolute basis); LaTeX quantity → pass; missing var / too-short trajectory → inconclusive. 8 new tests; full tests/mcp suite (860) green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two more executable CAS verification tools, completing an "executable checks" suite alongside conservation_check. - equation_check(lhs, rhs, assumptions?): verify a claimed algebraic identity — the core of every derivation step — via simplify/equals instead of the agent's mental algebra. Plain or LaTeX; optional per-symbol assumptions (positive/ real/integer/...) so identities like sqrt(x**2)=x (x>0) can be confirmed. - series_check(expression, variable, point, order, expected?): compute a Taylor/ asymptotic expansion (perturbation theory's workhorse) and, when an expected expansion is given, return a real pass/fail. Plain or LaTeX; supports point=oo. - Both reuse the _cas parser + symbolic_equal; undecidable/unparseable input → inconclusive, never a false pass. Registered read-only and added to the gpd-verification descriptor (builtin_servers.py + infra/gpd-verification.json) and contract-visibility expectations. Verified: pythagorean identity → pass, wrong identity → fail (+ difference), sqrt(x**2)=x → pass only under x>0; sin(x) series matches claim → pass, wrong expansion → fail, 1/(1-x) geometric series, asymptotic at oo. 12 new tests; full tests/mcp suite (872) green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three more executable verification capabilities. - ode_check(equation, solution, variable, function): substitutes a candidate solution and its derivatives (prime notation y, y', y'') into a differential equation and simplifies the residual — zero → pass, nonzero → fail. Solution may be plain or LaTeX. - tensor_check(equation): verifies free/dummy index consistency of a tensor equation — every term on a side shares the same free indices in the same up/down position, the two sides match, and repeated indices are valid one-up/one-down contractions. Catches the common GR/field-theory index errors (up/down mismatch, same-position repeats, unbalanced free indices). - symmetry_check now executes scale/dilation via Euler's homogeneity theorem, reporting the homogeneity degree (scale-invariant = degree 0). Lorentz is deliberately NOT auto-checked: it needs 4-vector/metric structure the tool cannot infer, and faking it would be a false verdict — it stays guidance-only. All conservative: unparseable/undecidable/absent-function/no-'=' → inconclusive; bad args → stable error envelope. Both new tools registered read-only and added to the gpd-verification descriptor (builtin_servers.py + infra json) and contract-visibility expectations. Verified: SHO/decay solutions → pass, wrong solution → fail; Maxwell tensor eq → pass, up/down + same-position + free-index errors → fail; x^2 potential → homogeneous degree 2. 13 new tests; full tests/mcp suite (885) green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two more executable CAS verification tools.
- integral_check(integrand, variable, claimed?, lower?, upper?): for an
indefinite integral, verifies the claimed antiderivative by differentiation
(d(claimed)/dx == integrand — the always-decidable direction); for a definite
integral, computes integrate(integrand,(x,lower,upper)) and compares to the
claimed value. With no claim, returns the computed result. Plain or LaTeX.
- commutator_check(operators, a, b, expected, variable): verifies an operator
commutator [a,b] = expected by acting on a test function. Operators are given
as their action on f (f, f_x, f_xx for derivatives), e.g. {"X":"x*f",
"P":"-I*hbar*f_x"}; computes a(b f) - b(a f) and compares. The canonical
[x,p] = i hbar verifies as pass.
Both conservative: unparseable input, undecidable comparisons, a SymPy integral
that won't evaluate, or a missing operator → inconclusive; malformed args (e.g.
only one integration bound) → stable error envelope. Registered read-only and
added to the gpd-verification descriptor (builtin_servers.py + infra json) and
contract-visibility expectations.
Verified: x^2 -> x^3/3 pass, wrong antiderivative fail, definite x^2 on [0,1] =
1/3, Gaussian on (-oo,oo) = sqrt(pi); [X,P]=i hbar pass, wrong/[X,X]=0 handled.
11 new tests; full tests/mcp suite (896) green.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- pde_check(equation, solution, variables, function): generalizes ode_check to partial differential equations. Partial derivatives use subscript notation (u_t, u_xx, u_xt for mixed); substitutes the solution and its partials and simplifies the residual. Verifies heat, wave, Laplace, and free-particle Schrodinger solutions; a wrong solution fails. - matrix_check(matrix, property): verifies symmetric / antisymmetric / hermitian / anti-hermitian / unitary / orthogonal / idempotent / involutory / normal via SymPy Matrix algebra. Entries are expression strings; I is the imaginary unit (e.g. Pauli Y = [[0,-I],[I,0]]). Non-square matrices fail square-only properties. Both conservative: unparseable input, undecidable comparisons, an absent function, or an unknown property → inconclusive; malformed matrices → stable error envelope. Registered read-only and added to the gpd-verification descriptor (builtin_servers.py + infra json) and contract-visibility expectations. Verified: heat/wave/Laplace/Schrodinger PDEs → pass, wrong → fail; Pauli X/Y hermitian+unitary, rotation orthogonal, projector idempotent → pass; non-square unitary → fail. 13 new tests; full tests/mcp suite (909) green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
🤖 RoastBot: Alex submitted PR #248 six minutes after this one. He is not okay. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/gpd/mcp/servers/verification_server.py`:
- Around line 5636-5644: The pde_check flow currently allows an empty variables
list to reach _cas.check_pde; after calling _validate_string_list (the call that
produces validated_variables) add an explicit check that validated_variables is
non-empty and return a validation error if it is empty (e.g., same error shape
used elsewhere), before proceeding to call _cas.check_pde; update the code in
the pde_check function (around the _validate_string_list -> validated_variables,
validated_function, _cas.check_pde sequence) to reject [] immediately.
- Around line 5591-5609: The code validates `operators` via
_validate_string_mapping and `a`/`b` via _validate_string but never ensures `a`
and `b` are keys in the `operators` mapping; add explicit checks after the
validations to verify that validated_a and validated_b exist as keys in
validated_operators and return a clear input error if either is missing (use the
same error style as _validate_string/_validate_string_mapping and include field
names "a" or "b" and the available operator keys), before calling
_cas.check_commutator.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 8f49d8f1-e397-4595-be6b-4728a66ea912
📒 Files selected for processing (6)
infra/gpd-verification.jsonsrc/gpd/mcp/builtin_servers.pysrc/gpd/mcp/servers/_cas.pysrc/gpd/mcp/servers/verification_server.pytests/mcp/test_servers.pytests/mcp/test_tool_contract_visibility.py
| with gpd_span("mcp.verification.commutator_check"): | ||
| validated_operators, error = _validate_string_mapping(operators, field_name="operators") | ||
| if error is not None: | ||
| return error | ||
| validated_a, error = _validate_string(a, field_name="a") | ||
| if error is not None: | ||
| return error | ||
| validated_b, error = _validate_string(b, field_name="b") | ||
| if error is not None: | ||
| return error | ||
| validated_expected, error = _validate_string(expected, field_name="expected") | ||
| if error is not None: | ||
| return error | ||
| validated_variable, error = _validate_string(variable, field_name="variable") | ||
| if error is not None: | ||
| return error | ||
| result = _cas.check_commutator( | ||
| validated_operators, validated_a, validated_b, validated_expected, validated_variable | ||
| ) |
There was a problem hiding this comment.
Validate that a and b are actually defined in operators.
Right now a request like a="X" / b="P" is accepted even when operators has no X or P entry. That pushes a basic request-shape error into _cas.check_commutator(), so callers get backend-dependent failures instead of a clear input error at the tool boundary.
Suggested fix
with gpd_span("mcp.verification.commutator_check"):
validated_operators, error = _validate_string_mapping(operators, field_name="operators")
if error is not None:
return error
validated_a, error = _validate_string(a, field_name="a")
@@
validated_variable, error = _validate_string(variable, field_name="variable")
if error is not None:
return error
+ missing = [name for name in (validated_a, validated_b) if name not in validated_operators]
+ if missing:
+ missing_names = ", ".join(sorted(dict.fromkeys(missing)))
+ return _error_result(f"operators is missing definitions for: {missing_names}")
result = _cas.check_commutator(
validated_operators, validated_a, validated_b, validated_expected, validated_variable
)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/mcp/servers/verification_server.py` around lines 5591 - 5609, The
code validates `operators` via _validate_string_mapping and `a`/`b` via
_validate_string but never ensures `a` and `b` are keys in the `operators`
mapping; add explicit checks after the validations to verify that validated_a
and validated_b exist as keys in validated_operators and return a clear input
error if either is missing (use the same error style as
_validate_string/_validate_string_mapping and include field names "a" or "b" and
the available operator keys), before calling _cas.check_commutator.
| validated_variables, error = _validate_string_list(variables, field_name="variables") | ||
| if error is not None: | ||
| return error | ||
| validated_function, error = _validate_string(function, field_name="function") | ||
| if error is not None: | ||
| return error | ||
| result = _cas.check_pde( | ||
| validated_equation, validated_solution, validated_variables, validated_function | ||
| ) |
There was a problem hiding this comment.
Reject empty variables lists before calling the PDE CAS.
pde_check currently accepts variables=[], even though the tool contract requires at least one independent variable. Letting an empty list through turns an immediate input error into whatever _cas.check_pde() happens to do with it.
Suggested fix
validated_variables, error = _validate_string_list(variables, field_name="variables")
if error is not None:
return error
+ if not validated_variables:
+ return _error_result("variables must contain at least one variable name")
validated_function, error = _validate_string(function, field_name="function")
if error is not None:
return error🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/gpd/mcp/servers/verification_server.py` around lines 5636 - 5644, The
pde_check flow currently allows an empty variables list to reach _cas.check_pde;
after calling _validate_string_list (the call that produces validated_variables)
add an explicit check that validated_variables is non-empty and return a
validation error if it is empty (e.g., same error shape used elsewhere), before
proceeding to call _cas.check_pde; update the code in the pde_check function
(around the _validate_string_list -> validated_variables, validated_function,
_cas.check_pde sequence) to reject [] immediately.
Summary
Make GPD's physics-verification checks actually compute instead of returning
prose. Previously
dimensional_check/limiting_case_check/symmetry_checkreturned a structural marker (
"documented"/"requires_verification") andtold the agent to "use SymPy" — delegating the verdict to the LLM's weakest
skill (symbolic algebra). This adds a SymPy computer-algebra backend so the
verdict no longer depends on trusting the agent under review, accepts LaTeX
input (how physics is written), and grows the verification surface from 3 checks
to a full executable suite.
There are no behavioral surprises: existing tools are upgraded additively (all
prior fields preserved), and every check returns
inconclusiverather than afalse
passwhen it can't decide.Executable checks
Upgraded (now compute):
limiting_case_check—sympy.limit(expr, var, point)vs expected → pass/failsymmetry_check— parity (x→-x), time-reversal (t→-t), and scale/dilation (Euler homogeneity degree)dimensional_check— keeps[M][L][T]annotations and adds symbolic dimensional analysis of real equations via adimensionsmap (E = m c^2with {E:energy,m:mass,c:velocity})New tools:
conservation_checkequation_checkseries_checkode_checkpde_checktensor_checkintegral_checkcommutator_check[a,b]=expected(e.g.[x,p]=iℏ)matrix_checkLaTeX input
Expressions are parsed with SymPy's grammar-based lark backend (no code
execution), with placeholder rewriting for lark-unsupported macros (
\hbar,\Omega, …). When lark rejects valid physics (e.g. juxtaposed superscriptproducts
\frac{\hbar^2 k^2}{2m}), a conservative delatexify fallbackconverts to plain infix with implicit multiplication. Differential operators are
refused so the fallback can never fabricate algebra from a derivative.
Principled scope: no fake Lorentz check
symmetry_checkdoes not auto-verify Lorentz invariance — that needs the4-vector/metric structure the tool can't infer from a bare expression, so faking
a verdict would violate "never a false pass." Lorentz stays guidance-only;
tensor_checkcovers the index-bookkeeping half instead.Safety / honesty
Untrusted input is screened (unsafe Python tokens and dangerous TeX rejected),
parsing forces identifiers to plain Symbols, and SymPy runs in a timeout-bounded
worker thread. Any parse failure, timeout, undecidable comparison, missing
symbol/operator/function, non-real value, too-short trajectory, or unevaluable
integral/series →
inconclusive, never a falsepass.Dependencies
sympy>=1.13— CAS backendlark>=1.1— pure-Python LaTeX-math grammar forparse_latexIntegration & tests
All tools are registered read-only and listed in the
gpd-verificationdescriptor (
builtin_servers.py,infra/gpd-verification.json) and thecontract-visibility expectations. Curated release surfaces and repo-graph
artifacts are kept in sync with the new deps/file. ~120 new tests; full
tests/mcp+ release + repo-graph suites green locally (1028 passed).Summary by CodeRabbit
New Features
Tests
Chores