Skip to content

feat(verification): executable SymPy verification oracle — 12 LaTeX-aware physics checks#246

Open
alexandergagliano wants to merge 11 commits into
mainfrom
cas-verification-oracle
Open

feat(verification): executable SymPy verification oracle — 12 LaTeX-aware physics checks#246
alexandergagliano wants to merge 11 commits into
mainfrom
cas-verification-oracle

Conversation

@alexandergagliano

@alexandergagliano alexandergagliano commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Make GPD's physics-verification checks actually compute instead of returning
prose. Previously dimensional_check / limiting_case_check / symmetry_check
returned a structural marker ("documented" / "requires_verification") and
told the agent to "use SymPy" — delegating the verdict to the LLM's weakest
skill (symbolic algebra). This adds a SymPy computer-algebra backend so the
verdict no longer depends on trusting the agent under review, accepts LaTeX
input (how physics is written), and grows the verification surface from 3 checks
to a full executable suite.

There are no behavioral surprises: existing tools are upgraded additively (all
prior fields preserved), and every check returns inconclusive rather than a
false pass
when it can't decide.

Executable checks

Upgraded (now compute):

  • limiting_case_checksympy.limit(expr, var, point) vs expected → pass/fail
  • symmetry_check — parity (x→-x), time-reversal (t→-t), and scale/dilation (Euler homogeneity degree)
  • dimensional_check — keeps [M][L][T] annotations and adds symbolic dimensional analysis of real equations via a dimensions map (E = m c^2 with {E:energy,m:mass,c:velocity})

New tools:

Tool Verifies
conservation_check energy/momentum/charge drift along a trajectory
equation_check any claimed algebraic identity (every derivation step)
series_check Taylor / asymptotic expansions
ode_check a solution satisfies an ODE (prime notation)
pde_check a solution satisfies a PDE (subscript notation; heat/wave/Laplace/Schrödinger)
tensor_check free/dummy index consistency of a tensor equation
integral_check antiderivative (by differentiation) or definite integral
commutator_check operator commutator [a,b]=expected (e.g. [x,p]=iℏ)
matrix_check symmetric/hermitian/unitary/orthogonal/idempotent/involutory/normal
limiting_case_check("f = sin(x)/x", {"x -> 0": "0"})                     # "fail" (it's 1)
pde_check("u_t = alpha*u_xx", "exp(-alpha*k**2*t)*sin(k*x)", ["t","x"])  # "pass" (heat eq)
matrix_check([["0","-I"],["I","0"]], "unitary")                          # "pass" (Pauli Y)
commutator_check({"X":"x*f","P":"-I*hbar*f_x"}, "X","P","I*hbar*f")      # "pass" ([x,p]=iħ)
integral_check("exp(-x**2)", "x", "sqrt(pi)", "-oo", "oo")               # "pass"

LaTeX input

Expressions are parsed with SymPy's grammar-based lark backend (no code
execution), with placeholder rewriting for lark-unsupported macros (\hbar,
\Omega, …). When lark rejects valid physics (e.g. juxtaposed superscript
products \frac{\hbar^2 k^2}{2m}), a conservative delatexify fallback
converts to plain infix with implicit multiplication. Differential operators are
refused so the fallback can never fabricate algebra from a derivative.

Principled scope: no fake Lorentz check

symmetry_check does not auto-verify Lorentz invariance — that needs the
4-vector/metric structure the tool can't infer from a bare expression, so faking
a verdict would violate "never a false pass." Lorentz stays guidance-only;
tensor_check covers the index-bookkeeping half instead.

Safety / honesty

Untrusted input is screened (unsafe Python tokens and dangerous TeX rejected),
parsing forces identifiers to plain Symbols, and SymPy runs in a timeout-bounded
worker thread. Any parse failure, timeout, undecidable comparison, missing
symbol/operator/function, non-real value, too-short trajectory, or unevaluable
integral/series → inconclusive, never a false pass.

Dependencies

  • sympy>=1.13 — CAS backend
  • lark>=1.1 — pure-Python LaTeX-math grammar for parse_latex

Integration & tests

All tools are registered read-only and listed in the gpd-verification
descriptor (builtin_servers.py, infra/gpd-verification.json) and the
contract-visibility expectations. Curated release surfaces and repo-graph
artifacts are kept in sync with the new deps/file. ~120 new tests; full
tests/mcp + release + repo-graph suites green locally (1028 passed).

Consolidated from the former stacked PR #248 into this single PR.

Summary by CodeRabbit

  • New Features

    • CAS-backed symbolic verification with detailed per-check CAS results and aggregated verdicts
    • New verification tools: conservation, equation, series, ODE, tensor, integral, commutator, PDE, and matrix checks
    • Dimensional-consistency checks accepting named quantity mappings and bracket-form specs; richer limit and symmetry reporting
  • Tests

    • Expanded test coverage for all CAS-backed checks, LaTeX/math parsing, timeouts, and verdict behaviors
  • Chores

    • Added sympy and lark to project dependencies

Alex Gagliano and others added 4 commits June 4, 2026 16:42
The limiting_case_check and symmetry_check MCP tools previously returned
only structural markers ("documented" / "requires_verification") and told
the calling agent to "use SymPy" — delegating the actual verdict to the
LLM's weakest skill (symbolic algebra). This makes those checks compute.

- Add a lazy-imported CAS backend (gpd/mcp/servers/_cas.py):
  * safe_parse: hardened parser (unsafe-token screen, length cap, identifiers
    forced to plain Symbols so `gamma`/`E`/`I` aren't reinterpreted as SymPy
    specials, ^->** and infinity->oo normalization), RHS of equations.
  * check_limit: computes sympy.limit(expr, var, point) for "var -> point"
    limits and compares to the expected result -> pass/fail/computed.
  * check_symmetry: executes parity (x->-x) / time-reversal (t->-t) by
    substitution and classifies even/odd/neither.
  * Conservative by construction: any parse failure, timeout, or undecidable
    comparison downgrades to `inconclusive`, never `pass`. SymPy runs in a
    daemon worker thread with a timeout.

- Wire results additively into verification_server: each result gains a `cas`
  block and the response gains `cas_executed` + `overall_cas_verdict` (a single
  FAIL dominates). The legacy `status`/`matched_type`/`strategy` fields are
  preserved, so non-computable inputs (prose limits, gauge/Lorentz symmetries)
  fall back to the prior structural behavior.

- Add `sympy>=1.13` as a core dependency.
- 9 new tests covering pass/fail/inconclusive, infinity limits, unsafe-input
  rejection, prose-limit fallback, and parity classification. Full
  verification suite (285 tests) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Physics derivations are written in LaTeX, so the limiting-case and symmetry
checks now parse LaTeX input, widening when the oracle actually fires.

- _cas.safe_parse routes LaTeX (detected by backslash-commands / ^{ / _{) to
  SymPy's grammar-based lark backend (no code execution, unlike sympify).
- Rewrite the handful of lark-unsupported-but-common macros (\hbar, \Omega,
  \sigma, \ell, and the constant \pi) to collision-free placeholder macros and
  substitute them back to the intended Symbol/constant after parsing.
- parse_limit_target understands LaTeX limit keys: \to / \rightarrow arrows and
  a leading backslash on the variable (e.g. r"\hbar \to 0", r"c \to \infty").
- Refuse dangerous TeX (\input, \def, \write, \begin, ...) outright; any
  still-unparseable LaTeX (e.g. juxtaposed superscript products a^2 b^2, which
  the lark grammar rejects) returns None → inconclusive, never a false pass.
- Add lark>=1.1 as a dependency.

Verified end-to-end: classical (hbar -> 0) limit of a LaTeX expression yields a
real pass on a correct claim and fail on a wrong one; parity of a LaTeX
potential classifies even/odd; hard LaTeX degrades to inconclusive. 5 new tests;
full mcp server suite (387 tests) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…jects

lark cannot parse juxtaposed superscripted products (a^2 b^2), which are
ubiquitous in physics (e.g. kinetic energy hbar^2 k^2 / 2m). When lark fails,
_parse_latex now falls back to a conservative delatexify pass:

- Brace-matched rewrites: \frac{A}{B} -> ((A)/(B)), \sqrt{A} -> sqrt((A)),
  \sqrt[n]{A} -> ((A)**(1/(n))), ^{...} -> **(...), a_{bc} -> a_bc, residual
  braces -> parens, ^ -> **, and macro-backslash stripping (\omega -> omega).
- The plain string is parsed with implicit multiplication/application
  (2m -> 2*m, sin x -> sin(x)) via a shared _parse_plain helper.
- Differential operators (\partial, \nabla, \frac{d ...}) are refused outright
  so the fallback can never fabricate algebra from a derivative; any leftover
  brace/backslash also returns None. Unparseable -> inconclusive, never a pass.

Now parses full Hamiltonians, Coulomb terms, sqrt forms, etc.; kinetic-energy
classical-limit check returns a real pass. Updated the prior "degrades safely"
test (now rescued) and added a derivative-refusal test. CAS suite green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
dimensional_check previously only worked on hand-annotated [M][L][T] bracket
tokens, so it could not check a real equation like "E = m c^2". Add an optional
`dimensions` map (symbol → named quantity or bracket spec) that drives a
symbolic dimensional analysis via the CAS.

- _cas.check_dimensions / dimension_vector: parse each side (LaTeX or plain with
  implicit multiplication), substitute base-dimension exponents per symbol, and
  compare. Handles products, rational powers, and dimensionless-argument
  functions; a sum mixing incompatible dimensions is a FAIL.
- verification_server: _NAMED_DIMENSIONS table (energy/velocity/force/... over
  {M,L,T,Q,Theta}) + _spec_to_dimvec (named or [M][L][T] spec). The tool gains
  an optional `dimensions` param; the bracket-annotation path is unchanged.
  Each symbolically-checked result gets a `cas` block; response gains
  cas_executed + overall_cas_verdict.
- Unknown symbols / unparseable sides → inconclusive, never a false pass.

Verified: E=m c^2, F=m a, E=hbar omega, LaTeX E=1/2 m v^2 → pass; E=m c and
mixed-dim sum x=a+t → fail; missing symbol → inconclusive. Bracket path and all
prior behavior intact. 6 new tests; full mcp server suite (562) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@CLAassistant

CLAassistant commented Jun 4, 2026

Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
1 out of 2 committers have signed the CLA.

✅ cmaloney111
❌ Alex Gagliano


Alex Gagliano seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds SymPy and Lark dependencies, introduces a new _cas.py computer-algebra module implementing safe parsing and symbolic verification (limits, symmetry, dimensional analysis), and integrates CAS capabilities into three verification server tools with structured result payloads and aggregation metadata. Comprehensive tests exercise all CAS-backed behavior.

Changes

CAS-Backed Symbolic Verification Integration

Layer / File(s) Summary
Dependencies for CAS and LaTeX parsing
pyproject.toml
Adds sympy>=1.13 and lark>=1.1 to dependencies for CAS and LaTeX parsing.
CAS module: core infrastructure and safety
src/gpd/mcp/servers/_cas.py
Module docs, verdict constants (VERDICT_PASS, VERDICT_FAIL, VERDICT_COMPUTED, VERDICT_INCONCLUSIVE), safety whitelists/regexes, and run_with_timeout.
CAS module: LaTeX parsing pipeline
src/gpd/mcp/servers/_cas.py
LaTeX detection and SymPy lark parse_latex path with macro fixups plus a conservative delatexify fallback that rewrites limited macros and rejects differential operators.
CAS module: safe parsing and symbolic equality
src/gpd/mcp/servers/_cas.py
safe_parse for guarded parsing of plain/LaTeX input; symbolic_equal performs timed SymPy equality checks returning True/False/None; limit target parsing included.
CAS module: executable checks
src/gpd/mcp/servers/_cas.py
check_symmetry, check_dimensions, check_conservation, check_equation, check_series, check_ode, check_tensor_indices, check_integral, check_commutator, check_pde, and check_matrix implemented with structured verdict payloads and timeouts.
Verification server: CAS integration across tools
src/gpd/mcp/servers/verification_server.py, src/gpd/mcp/builtin_servers.py, infra/gpd-verification.json
Imported _cas; added _NAMED_DIMENSIONS and _spec_to_dimvec; dimensional_check gains optional dimensions parameter and executes CAS checks when applicable; limiting_case_check and symmetry_check now include per-item cas blocks and aggregated CAS metadata; advertised capabilities updated to include new checks.
Tests and repo-graph updates
tests/mcp/test_servers.py, tests/README.md, tests/repo_graph_contract.json, tests/test_release_consistency.py, tests/mcp/test_tool_contract_visibility.py
Added CAS-backed tests covering dimensional, limit, symmetry, and many new CAS tools; updated repo-graph counts and expected dependency set in test fixtures/docs.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant VerificationServer
  participant CASModule
  Client->>VerificationServer: submit check request (dimensional/limit/symmetry or CAS tool)
  VerificationServer->>CASModule: call check_* (safe_parse, run_with_timeout)
  CASModule-->>VerificationServer: cas verdict & details
  VerificationServer-->>Client: aggregated result with cas block
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested reviewers

  • madeleinesong

Poem

🐰 A rabbit nudges a CAS-lit trail,

parsing LaTeX where margins pale,
limits tamed and symmetries found,
dimensions counted safe and sound,
hops off with a verdict, light and hale.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding executable SymPy-based verification checks with LaTeX support to the verification server.
Description check ✅ Passed The description is comprehensive, covering what changed (CAS backend additions), why (move from prose to computation), testing done (120+ tests, full suite green), but lacks explicit checklist completion marks.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cas-verification-oracle

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
src/gpd/mcp/servers/_cas.py (1)

148-151: 💤 Low value

Consider making const fixup handling extensible.

The current code hardcodes sympy.pi for any name=None entry in _LATEX_CONST_FIXUPS. If additional constants are added later (e.g., \e for Euler's number), they would incorrectly map to pi. Consider changing the data structure to map macros directly to their SymPy constants:

_LATEX_CONST_FIXUPS = {r"\pi": "pi"}  # value is sympy constant name

Then retrieve via getattr(sympy, name).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/gpd/mcp/servers/_cas.py` around lines 148 - 151, The code currently
hardcodes sympy.pi when name is None in the block using
placeholder/cleaned/substitutions, which will mis-map any future constants;
update _LATEX_CONST_FIXUPS to map LaTeX macros to SymPy names (e.g. {r"\\pi":
"pi", r"\\e": "E"}) and change the logic in the clause that builds substitutions
(the code using placeholder, cleaned, target and substitutions) to look up the
mapped string and call getattr(sympy, mapped_name) for known SymPy constants,
falling back to sympy.Symbol(mapped_name) or
sympy.Symbol(placeholder.lstrip("\\")) when no mapping exists, removing the
hardcoded sympy.pi branch so new constants resolve correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/gpd/mcp/servers/verification_server.py`:
- Around line 2161-2173: _spec_to_dimvec currently treats any string containing
"[" as a bracket spec even when tokens are unrecognized; change it so that when
"[" is present you first validate the entire string matches the expected
bracket-token grammar (only bracketed dimension symbols optionally with
exponent, repeated with no extra text) and reject (return None) if the raw text
contains unknown/unmatched tokens; then call _parse_dimensions and also verify
that every bracket token found in the original text produced a corresponding
entry in the parsed dict (i.e., if tokens were present but parsing yielded
no/partial results, return None). Use the functions _spec_to_dimvec and
_parse_dimensions and the raw input text to implement this full-match +
token-to-parse consistency check.
- Around line 5140-5148: The reducer _aggregate_cas_verdict currently returns
PASS if any child is PASS, which lets a passing item mask INCONCLUSIVE or
COMPUTED; change the precedence so FAIL still dominates, but INCONCLUSIVE takes
priority over COMPUTED and PASS (i.e., check for _cas.VERDICT_FAIL first, then
_cas.VERDICT_INCONCLUSIVE, then _cas.VERDICT_COMPUTED, then _cas.VERDICT_PASS)
and keep the default return as _cas.VERDICT_INCONCLUSIVE to avoid false passes.

---

Nitpick comments:
In `@src/gpd/mcp/servers/_cas.py`:
- Around line 148-151: The code currently hardcodes sympy.pi when name is None
in the block using placeholder/cleaned/substitutions, which will mis-map any
future constants; update _LATEX_CONST_FIXUPS to map LaTeX macros to SymPy names
(e.g. {r"\\pi": "pi", r"\\e": "E"}) and change the logic in the clause that
builds substitutions (the code using placeholder, cleaned, target and
substitutions) to look up the mapped string and call getattr(sympy, mapped_name)
for known SymPy constants, falling back to sympy.Symbol(mapped_name) or
sympy.Symbol(placeholder.lstrip("\\")) when no mapping exists, removing the
hardcoded sympy.pi branch so new constants resolve correctly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 90f3b9e9-037a-40c6-8e88-137abcc1e893

📥 Commits

Reviewing files that changed from the base of the PR and between 0f41769 and f133c55.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • pyproject.toml
  • src/gpd/mcp/servers/_cas.py
  • src/gpd/mcp/servers/verification_server.py
  • tests/mcp/test_servers.py

Comment on lines +2161 to +2173
def _spec_to_dimvec(spec: str) -> dict[str, int] | None:
"""Resolve a dimension spec to base-exponents, or None if unrecognized.

Accepts a named quantity ("energy", "velocity", ...) or a bracket form
("[M][L]^2[T]^-2"). Returns {} for an explicitly dimensionless spec.
"""
text = spec.strip()
named = _NAMED_DIMENSIONS.get(text.lower())
if named is not None:
return dict(named)
if "[" in text:
return {k: v for k, v in _parse_dimensions(text).items() if v != 0}
return None

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject partial bracket-spec parses here.

Line 2171 accepts any string containing [ as a bracket-form spec, but _parse_dimensions() ignores unknown tokens. Inputs like "[M][X]", "[foo]", or "mass [L]" therefore collapse to a partial vector or {} instead of being rejected, which can let dimensional_check(..., dimensions=...) report a false pass.

Suggested fix
 def _spec_to_dimvec(spec: str) -> dict[str, int] | None:
     """Resolve a dimension spec to base-exponents, or None if unrecognized.
@@
     text = spec.strip()
     named = _NAMED_DIMENSIONS.get(text.lower())
     if named is not None:
         return dict(named)
     if "[" in text:
-        return {k: v for k, v in _parse_dimensions(text).items() if v != 0}
+        normalized = re.sub(r"\s+", "", text)
+        if re.fullmatch(r"(?:\[(?:M|L|T|Q|Theta)\](?:\^[+-]?\d+)?)+", normalized) is None:
+            return None
+        return {k: v for k, v in _parse_dimensions(normalized).items() if v != 0}
     return None
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/gpd/mcp/servers/verification_server.py` around lines 2161 - 2173,
_spec_to_dimvec currently treats any string containing "[" as a bracket spec
even when tokens are unrecognized; change it so that when "[" is present you
first validate the entire string matches the expected bracket-token grammar
(only bracketed dimension symbols optionally with exponent, repeated with no
extra text) and reject (return None) if the raw text contains unknown/unmatched
tokens; then call _parse_dimensions and also verify that every bracket token
found in the original text produced a corresponding entry in the parsed dict
(i.e., if tokens were present but parsing yielded no/partial results, return
None). Use the functions _spec_to_dimvec and _parse_dimensions and the raw input
text to implement this full-match + token-to-parse consistency check.

Comment on lines +5140 to +5148
def _aggregate_cas_verdict(verdicts: list[str]) -> str:
"""Collapse per-item CAS verdicts into one. A single FAIL dominates."""
if _cas.VERDICT_FAIL in verdicts:
return _cas.VERDICT_FAIL
if _cas.VERDICT_PASS in verdicts:
return _cas.VERDICT_PASS
if _cas.VERDICT_COMPUTED in verdicts:
return _cas.VERDICT_COMPUTED
return _cas.VERDICT_INCONCLUSIVE

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't let one passing CAS item mask inconclusive ones.

This reducer currently returns pass as soon as any child verdict passes. A mixed result like ["pass", "inconclusive"] or ["pass", "computed"] will therefore surface an overall pass, which overstates the outcome and breaks the “never false pass” contract for the new CAS metadata.

Suggested fix
 def _aggregate_cas_verdict(verdicts: list[str]) -> str:
     """Collapse per-item CAS verdicts into one. A single FAIL dominates."""
     if _cas.VERDICT_FAIL in verdicts:
         return _cas.VERDICT_FAIL
-    if _cas.VERDICT_PASS in verdicts:
-        return _cas.VERDICT_PASS
+    if not verdicts or _cas.VERDICT_INCONCLUSIVE in verdicts:
+        return _cas.VERDICT_INCONCLUSIVE
     if _cas.VERDICT_COMPUTED in verdicts:
         return _cas.VERDICT_COMPUTED
-    return _cas.VERDICT_INCONCLUSIVE
+    return _cas.VERDICT_PASS
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def _aggregate_cas_verdict(verdicts: list[str]) -> str:
"""Collapse per-item CAS verdicts into one. A single FAIL dominates."""
if _cas.VERDICT_FAIL in verdicts:
return _cas.VERDICT_FAIL
if _cas.VERDICT_PASS in verdicts:
return _cas.VERDICT_PASS
if _cas.VERDICT_COMPUTED in verdicts:
return _cas.VERDICT_COMPUTED
return _cas.VERDICT_INCONCLUSIVE
def _aggregate_cas_verdict(verdicts: list[str]) -> str:
"""Collapse per-item CAS verdicts into one. A single FAIL dominates."""
if _cas.VERDICT_FAIL in verdicts:
return _cas.VERDICT_FAIL
if not verdicts or _cas.VERDICT_INCONCLUSIVE in verdicts:
return _cas.VERDICT_INCONCLUSIVE
if _cas.VERDICT_COMPUTED in verdicts:
return _cas.VERDICT_COMPUTED
return _cas.VERDICT_PASS
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/gpd/mcp/servers/verification_server.py` around lines 5140 - 5148, The
reducer _aggregate_cas_verdict currently returns PASS if any child is PASS,
which lets a passing item mask INCONCLUSIVE or COMPUTED; change the precedence
so FAIL still dominates, but INCONCLUSIVE takes priority over COMPUTED and PASS
(i.e., check for _cas.VERDICT_FAIL first, then _cas.VERDICT_INCONCLUSIVE, then
_cas.VERDICT_COMPUTED, then _cas.VERDICT_PASS) and keep the default return as
_cas.VERDICT_INCONCLUSIVE to avoid false passes.

cmaloney111 and others added 7 commits June 4, 2026 17:46
…acle sources

The CAS verification oracle adds src/gpd/mcp/servers/_cas.py, bumping the
src/gpd/mcp/servers/*.py node count from 15 to 16. Regenerate the repo-graph
contract artifacts (tests/repo_graph_contract.json, tests/README.md) so the
generated-artifact CI check passes, and apply ruff formatting to the CAS
oracle and verification server sources.
…urfaces

The CAS oracle adds sympy and lark as runtime dependencies. Add them to the
curated runtime-dependency set in tests/test_release_consistency.py and to the
pyproject dependency-label graph in tests/README.md so the dependency-surface,
wheel-metadata, and dependency-graph-docs release tests pass.

Complements the repo-graph regeneration in the preceding commit.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Conservation of energy/momentum/charge is the most common dynamics error class
(a quantity that silently drifts), and it had no executable verification surface
— only an oracle_hint string. Add a real tool that computes it.

- conservation_check(quantity, trajectory, tolerance) evaluates a conserved
  quantity (plain or LaTeX, via the _cas parser) at every state of a trajectory
  and measures drift: relative drift vs tolerance, or absolute drift when the
  quantity is ~0 throughout (e.g. total momentum = 0). Returns pass/fail plus
  min/max/initial/final values and the drift basis.
- Conservative: <2 states, unparseable quantity, missing variables, or non-real
  values → inconclusive; malformed trajectory/tolerance → stable error envelope.
  Never a false pass.
- Registered read-only; added to the gpd-verification descriptor capabilities
  (builtin_servers.py + infra/gpd-verification.json) and the contract-visibility
  expectations.

Verified: SHO energy on the unit circle → pass (zero drift); a drifting
integrator → fail; zero total momentum → pass (absolute basis); LaTeX quantity →
pass; missing var / too-short trajectory → inconclusive. 8 new tests; full
tests/mcp suite (860) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two more executable CAS verification tools, completing an "executable checks"
suite alongside conservation_check.

- equation_check(lhs, rhs, assumptions?): verify a claimed algebraic identity —
  the core of every derivation step — via simplify/equals instead of the agent's
  mental algebra. Plain or LaTeX; optional per-symbol assumptions (positive/
  real/integer/...) so identities like sqrt(x**2)=x (x>0) can be confirmed.
- series_check(expression, variable, point, order, expected?): compute a Taylor/
  asymptotic expansion (perturbation theory's workhorse) and, when an expected
  expansion is given, return a real pass/fail. Plain or LaTeX; supports point=oo.
- Both reuse the _cas parser + symbolic_equal; undecidable/unparseable input →
  inconclusive, never a false pass. Registered read-only and added to the
  gpd-verification descriptor (builtin_servers.py + infra/gpd-verification.json)
  and contract-visibility expectations.

Verified: pythagorean identity → pass, wrong identity → fail (+ difference),
sqrt(x**2)=x → pass only under x>0; sin(x) series matches claim → pass, wrong
expansion → fail, 1/(1-x) geometric series, asymptotic at oo. 12 new tests; full
tests/mcp suite (872) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Three more executable verification capabilities.

- ode_check(equation, solution, variable, function): substitutes a candidate
  solution and its derivatives (prime notation y, y', y'') into a differential
  equation and simplifies the residual — zero → pass, nonzero → fail. Solution
  may be plain or LaTeX.
- tensor_check(equation): verifies free/dummy index consistency of a tensor
  equation — every term on a side shares the same free indices in the same
  up/down position, the two sides match, and repeated indices are valid
  one-up/one-down contractions. Catches the common GR/field-theory index errors
  (up/down mismatch, same-position repeats, unbalanced free indices).
- symmetry_check now executes scale/dilation via Euler's homogeneity theorem,
  reporting the homogeneity degree (scale-invariant = degree 0). Lorentz is
  deliberately NOT auto-checked: it needs 4-vector/metric structure the tool
  cannot infer, and faking it would be a false verdict — it stays guidance-only.

All conservative: unparseable/undecidable/absent-function/no-'=' → inconclusive;
bad args → stable error envelope. Both new tools registered read-only and added
to the gpd-verification descriptor (builtin_servers.py + infra json) and
contract-visibility expectations.

Verified: SHO/decay solutions → pass, wrong solution → fail; Maxwell tensor eq →
pass, up/down + same-position + free-index errors → fail; x^2 potential →
homogeneous degree 2. 13 new tests; full tests/mcp suite (885) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Two more executable CAS verification tools.

- integral_check(integrand, variable, claimed?, lower?, upper?): for an
  indefinite integral, verifies the claimed antiderivative by differentiation
  (d(claimed)/dx == integrand — the always-decidable direction); for a definite
  integral, computes integrate(integrand,(x,lower,upper)) and compares to the
  claimed value. With no claim, returns the computed result. Plain or LaTeX.
- commutator_check(operators, a, b, expected, variable): verifies an operator
  commutator [a,b] = expected by acting on a test function. Operators are given
  as their action on f (f, f_x, f_xx for derivatives), e.g. {"X":"x*f",
  "P":"-I*hbar*f_x"}; computes a(b f) - b(a f) and compares. The canonical
  [x,p] = i hbar verifies as pass.

Both conservative: unparseable input, undecidable comparisons, a SymPy integral
that won't evaluate, or a missing operator → inconclusive; malformed args (e.g.
only one integration bound) → stable error envelope. Registered read-only and
added to the gpd-verification descriptor (builtin_servers.py + infra json) and
contract-visibility expectations.

Verified: x^2 -> x^3/3 pass, wrong antiderivative fail, definite x^2 on [0,1] =
1/3, Gaussian on (-oo,oo) = sqrt(pi); [X,P]=i hbar pass, wrong/[X,X]=0 handled.
11 new tests; full tests/mcp suite (896) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- pde_check(equation, solution, variables, function): generalizes ode_check to
  partial differential equations. Partial derivatives use subscript notation
  (u_t, u_xx, u_xt for mixed); substitutes the solution and its partials and
  simplifies the residual. Verifies heat, wave, Laplace, and free-particle
  Schrodinger solutions; a wrong solution fails.
- matrix_check(matrix, property): verifies symmetric / antisymmetric / hermitian
  / anti-hermitian / unitary / orthogonal / idempotent / involutory / normal via
  SymPy Matrix algebra. Entries are expression strings; I is the imaginary unit
  (e.g. Pauli Y = [[0,-I],[I,0]]). Non-square matrices fail square-only
  properties.

Both conservative: unparseable input, undecidable comparisons, an absent
function, or an unknown property → inconclusive; malformed matrices → stable
error envelope. Registered read-only and added to the gpd-verification
descriptor (builtin_servers.py + infra json) and contract-visibility
expectations.

Verified: heat/wave/Laplace/Schrodinger PDEs → pass, wrong → fail; Pauli X/Y
hermitian+unitary, rotation orthogonal, projector idempotent → pass; non-square
unitary → fail. 13 new tests; full tests/mcp suite (909) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@marcpickett1

Copy link
Copy Markdown
Collaborator

🤖 RoastBot: Alex submitted PR #248 six minutes after this one. He is not okay.

@alexandergagliano alexandergagliano changed the title feat(verification): execute dimensional / limiting-case / symmetry checks with a SymPy CAS oracle (LaTeX-aware) feat(verification): executable SymPy verification oracle — 12 LaTeX-aware physics checks Jun 4, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/gpd/mcp/servers/verification_server.py`:
- Around line 5636-5644: The pde_check flow currently allows an empty variables
list to reach _cas.check_pde; after calling _validate_string_list (the call that
produces validated_variables) add an explicit check that validated_variables is
non-empty and return a validation error if it is empty (e.g., same error shape
used elsewhere), before proceeding to call _cas.check_pde; update the code in
the pde_check function (around the _validate_string_list -> validated_variables,
validated_function, _cas.check_pde sequence) to reject [] immediately.
- Around line 5591-5609: The code validates `operators` via
_validate_string_mapping and `a`/`b` via _validate_string but never ensures `a`
and `b` are keys in the `operators` mapping; add explicit checks after the
validations to verify that validated_a and validated_b exist as keys in
validated_operators and return a clear input error if either is missing (use the
same error style as _validate_string/_validate_string_mapping and include field
names "a" or "b" and the available operator keys), before calling
_cas.check_commutator.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 8f49d8f1-e397-4595-be6b-4728a66ea912

📥 Commits

Reviewing files that changed from the base of the PR and between aff6adc and 51f4c63.

📒 Files selected for processing (6)
  • infra/gpd-verification.json
  • src/gpd/mcp/builtin_servers.py
  • src/gpd/mcp/servers/_cas.py
  • src/gpd/mcp/servers/verification_server.py
  • tests/mcp/test_servers.py
  • tests/mcp/test_tool_contract_visibility.py

Comment on lines +5591 to +5609
with gpd_span("mcp.verification.commutator_check"):
validated_operators, error = _validate_string_mapping(operators, field_name="operators")
if error is not None:
return error
validated_a, error = _validate_string(a, field_name="a")
if error is not None:
return error
validated_b, error = _validate_string(b, field_name="b")
if error is not None:
return error
validated_expected, error = _validate_string(expected, field_name="expected")
if error is not None:
return error
validated_variable, error = _validate_string(variable, field_name="variable")
if error is not None:
return error
result = _cas.check_commutator(
validated_operators, validated_a, validated_b, validated_expected, validated_variable
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate that a and b are actually defined in operators.

Right now a request like a="X" / b="P" is accepted even when operators has no X or P entry. That pushes a basic request-shape error into _cas.check_commutator(), so callers get backend-dependent failures instead of a clear input error at the tool boundary.

Suggested fix
     with gpd_span("mcp.verification.commutator_check"):
         validated_operators, error = _validate_string_mapping(operators, field_name="operators")
         if error is not None:
             return error
         validated_a, error = _validate_string(a, field_name="a")
@@
         validated_variable, error = _validate_string(variable, field_name="variable")
         if error is not None:
             return error
+        missing = [name for name in (validated_a, validated_b) if name not in validated_operators]
+        if missing:
+            missing_names = ", ".join(sorted(dict.fromkeys(missing)))
+            return _error_result(f"operators is missing definitions for: {missing_names}")
         result = _cas.check_commutator(
             validated_operators, validated_a, validated_b, validated_expected, validated_variable
         )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/gpd/mcp/servers/verification_server.py` around lines 5591 - 5609, The
code validates `operators` via _validate_string_mapping and `a`/`b` via
_validate_string but never ensures `a` and `b` are keys in the `operators`
mapping; add explicit checks after the validations to verify that validated_a
and validated_b exist as keys in validated_operators and return a clear input
error if either is missing (use the same error style as
_validate_string/_validate_string_mapping and include field names "a" or "b" and
the available operator keys), before calling _cas.check_commutator.

Comment on lines +5636 to +5644
validated_variables, error = _validate_string_list(variables, field_name="variables")
if error is not None:
return error
validated_function, error = _validate_string(function, field_name="function")
if error is not None:
return error
result = _cas.check_pde(
validated_equation, validated_solution, validated_variables, validated_function
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Reject empty variables lists before calling the PDE CAS.

pde_check currently accepts variables=[], even though the tool contract requires at least one independent variable. Letting an empty list through turns an immediate input error into whatever _cas.check_pde() happens to do with it.

Suggested fix
         validated_variables, error = _validate_string_list(variables, field_name="variables")
         if error is not None:
             return error
+        if not validated_variables:
+            return _error_result("variables must contain at least one variable name")
         validated_function, error = _validate_string(function, field_name="function")
         if error is not None:
             return error
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/gpd/mcp/servers/verification_server.py` around lines 5636 - 5644, The
pde_check flow currently allows an empty variables list to reach _cas.check_pde;
after calling _validate_string_list (the call that produces validated_variables)
add an explicit check that validated_variables is non-empty and return a
validation error if it is empty (e.g., same error shape used elsewhere), before
proceeding to call _cas.check_pde; update the code in the pde_check function
(around the _validate_string_list -> validated_variables, validated_function,
_cas.check_pde sequence) to reject [] immediately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants