Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions docs/internals/audit/FRESHNESS.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Audit Unit Freshness Tracker — Grippy

**Last updated:** 2026-03-14 (Phase 3B review + github-review paired26/30 CURRENT)
**Last updated:** 2026-03-14 (Phase 3C mcp-response solo27/30 CURRENT)

## Status Legend

Expand Down Expand Up @@ -73,7 +73,7 @@ Changes to these anchor functions trigger BOUNDARY_CHANGED status for affected u
|---|---|---|---|---|---|
| agent | 2026-03-14 | f8548ac | 7.8/10 | 0 | CURRENT |
| retry | 2026-03-13 | 259d0b8 | 7.9/10 | 0 | CURRENT |
| mcp-response | | | | | NEVER |
| mcp-response | 2026-03-14 | c415a22 | 7.9/10 | 0 | CURRENT |

### Phase 4 -- Integration (5 units)

Expand Down Expand Up @@ -108,6 +108,7 @@ Changes to these anchor functions trigger BOUNDARY_CHANGED status for affected u
| 2026-03-14 | rule-enrichment | Claude Opus 4.6 / Nelson Spence | `rule-enrichment/SCORECARD.md` |
| 2026-03-14 | agent | Claude Opus 4.6 / Nelson Spence | `agent/SCORECARD.md` |
| 2026-03-14 | review, github-review | Claude Opus 4.6 / Nelson Spence | `review/SCORECARD.md`, `github-review/SCORECARD.md` |
| 2026-03-14 | mcp-response | Claude Opus 4.6 / Nelson Spence | `mcp-response/SCORECARD.md` |

## Compound Chain Traces

Expand Down
252 changes: 252 additions & 0 deletions docs/internals/audit/mcp-response/SCORECARD.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,252 @@
<!-- SPDX-License-Identifier: MIT -->

# Audit Scorecard: mcp-response

**Audit date:** 2026-03-14
**Commit:** c415a22
**Auditor:** Claude Opus 4.6 (AI draft) / Nelson Spence (human reviewer)
**Unit type:** infrastructure (primary)
**Subprofile:** config

---

## Checklist

Infrastructure checklist (IN-01, IN-02) + Config subprofile (IN-C01, IN-C02).

| ID | Invariant | Status | Evidence |
|----|-----------|--------|----------|
| IN-01 | Missing config produces clear error | N/A | No configuration. Pure function serializers with no env vars, settings, or files. |
| IN-02 | Unit follows project conventions | PASS | SPDX header (mcp_response.py:1). ruff + mypy clean (CI). Test mirror: `test_grippy_mcp_response.py` (Tier A). |
| IN-C01 | Edge case inputs handled gracefully | PASS | `enrichment=None` (default) omits key — tested by 16 existing tests that all pass `None`. `enrichment` present — tested by `test_enrichment_serialized_when_present` (Tier A). `rule_findings=None` default — tested by `test_rule_findings_empty_by_default` (Tier A). |
| IN-C02 | AST/parsing operations do not crash on malformed input | N/A | No parsing. Pure dict construction from Pydantic-validated inputs. |

---

## Gate Rules

### Override Gates (force a specific status)

| Override Gate | Condition | Fired? |
|---|---|---|
| Critical finding | Any unresolved CRITICAL finding | No |
| Multi-HIGH block | 2+ unresolved HIGH findings (including provisional) | No |
| Security collapse | Security Posture < 2 | No (score: 7) |
| Adversarial collapse | Adversarial Resilience < 2 | No (score: 7) |

### Ceiling Gates (cap the best allowed status)

| Ceiling Gate | Condition | Fired? |
|---|---|---|
| Severity cap | Any unresolved HIGH finding | No |
| Security hard floor | Security Posture < 4 | No (score: 7) |
| Adversarial hard floor | Adversarial Resilience < 4 | No (score: 7) |
| Security soft floor | Security Posture < 6 | No (score: 7) |
| Adversarial soft floor | Adversarial Resilience < 6 | No (score: 7) |
| Accepted critical risk | Any ACCEPTED_RISK at CRITICAL | No |
| Accepted high risk | Any ACCEPTED_RISK at HIGH | No |

---

## Summary

| Dimension | Score | Evidence Tier | Notes |
|-----------|------:|:---:|-------|
| 1. Contract Fidelity | 8/10 | A | All functions typed with explicit returns. mypy strict clean. |
| 2. Robustness | 8/10 | A + C | No error paths needed — pure dict construction from validated inputs. |
| 3. Security Posture | 7/10 | A + C | No trust boundaries. Personality stripping verified by test. |
| 4. Adversarial Resilience | 7/10 | A + C | No untrusted input processed. Inputs are Pydantic-validated upstream. |
| 5. Auditability & Traceability | 6/10 | C | Deterministic pure functions — same input always produces same output. No logging. |
| 6. Test Quality | 8/10 | A | 18 tests, 3.27:1 test:source ratio. Positive, negative, edge case categories covered. |
| 7. Convention Adherence | 9/10 | A | SPDX, ruff, mypy strict, naming, test mirror. Calibration: matches ignore (9). |
| 8. Documentation Accuracy | 7/10 | C | Module docstring and per-function docstrings accurate. No stale references. |
| 9. Performance | 9/10 | C | O(n) list comprehensions over findings. No allocation beyond output dicts. Zero I/O. |
| 10. Dead Code / Debt | 9/10 | A + C | Zero TODOs. All 3 functions called (2 public by mcp_server.py, 1 internal by both). `_SEVERITY_NAMES` dict used by `_serialize_rule_finding`. |
| 11. Dependency Hygiene | 9/10 | A | 2 internal imports (schema, rules.base). 0 external. True config leaf. |
| **Overall** | **7.9/10** | | **Average of 11 dimensions** |

**Health status:** Adequate

**Determination:**
1. Average-based status: 7.9/10 falls in 6.0-7.9 range = **Adequate**
2. Override gates: None fired.
3. Ceiling gates: None fired. Average stands.
4. Suffixes: None. Dims 3 and 4 both have Tier A evidence (personality stripping test, enrichment test). No `(provisional)` needed.

**Override gates fired:** None
**Ceiling gates fired:** None

---

## Findings

No findings. Small serializer, low risk, clean code.

---

## Compound Chain Exposure

None identified. mcp-response does not participate in any known compound chain (CH-1 through CH-5). It serializes already-validated data (Pydantic `GrippyReview` from retry/agent, `RuleResult` from rule-engine) into JSON dicts consumed by mcp-server. It does not process untrusted input, post to external APIs, or modify system state.

---

## Dimension Details

### 1. Contract Fidelity

**Key question:** Do types, exports, and validation faithfully represent the unit's contract?

**Score:** 8/10
**Evidence:**
- mypy strict passes with zero issues (Tier A: static analysis).
- All 3 functions fully typed: `_serialize_rule_finding(r: RuleResult) -> dict[str, Any]`, `serialize_scan(...) -> dict[str, Any]`, `serialize_audit(...) -> dict[str, Any]` (Tier A: mypy).
- `_SEVERITY_NAMES` module-level constant is typed as `dict[int, str]` (mcp_response.py:15) (Tier C: code reading).
- Not 9: No Protocol classes. No runtime type checks (inputs are Pydantic-validated upstream, so runtime checks would be redundant).
- Calibration: matches schema (8) and ignore (8) — all have strict mypy, typed returns, no Protocols.

---

### 2. Robustness

**Key question:** Does the unit handle errors, retries, and edge cases correctly?

**Score:** 8/10
**Evidence:**
- Pure functions with no error paths to handle. Inputs are Pydantic-validated `GrippyReview` and `RuleResult` — invalid data is rejected before reaching this unit (Tier C: caller trace from mcp_server.py → serialize_scan/serialize_audit).
- `rule_findings` defaults to `None`, handled with `or []` (mcp_response.py:92) — no crash on missing optional (Tier A: `test_rule_findings_empty_by_default`).
- `enrichment` conditional: `if r.enrichment is not None` (mcp_response.py:29) — both branches tested (Tier A: `test_enrichment_serialized_when_present` + all existing tests pass None).
- `evidence` conditional: `if r.evidence is not None` (mcp_response.py:27) — tested with both present and absent (Tier A: `test_finding_serialization` lines 258, 266).
- Scored 8 (not 7) because the unit genuinely has no error conditions to handle — robustness ceiling is defined by input contract, which is fully covered.

---

### 3. Security Posture

**Key question:** Does the unit protect against injection, leakage, and unauthorized access?

**GATE DIMENSION: floors at 2 (collapse), 4 (hard), 6 (soft)**

**Score:** 7/10
**Evidence:**
- No trust boundaries owned (Tier A: registry.yaml confirms `boundaries: []`).
- Personality stripping is the only security-relevant behavior: `serialize_audit()` constructs output dict by explicit field selection — personality fields (`tone_register`, `opening_catchphrase`, `closing_line`, `disguise_used`, `ascii_art_key`) are never referenced (Tier A: `test_personality_fields_absent_from_audit` verifies all 5 fields absent; `test_strips_personality` verifies value-level absence).
- No secrets, no logging, no error messages, no hardcoded credentials.
- Not 8: No defense-in-depth needed — no attack surface. Score reflects "no exposure" rather than "strong defenses."

---

### 4. Adversarial Resilience

**Key question:** Can the unit resist adversarial input from untrusted PR content and prompt injection?

**GATE DIMENSION: floors at 2 (collapse), 4 (hard), 6 (soft)**

**Score:** 7/10
**Evidence:**
- Does not process untrusted input directly. All inputs are Pydantic-validated upstream: `GrippyReview` by retry.py's `_parse_response()`, `RuleResult` by rule-engine (Tier C: caller trace).
- Output is consumed by mcp_server.py which returns it as MCP tool response — no GitHub posting, no prompt injection vector (Tier C: caller trace).
- Personality stripping prevents internal prompt/persona content from leaking to AI consumers (Tier A: `test_personality_fields_absent_from_audit`).
- Not 8: No adversarial fixture matrix specific to this unit. Limited exposure makes dedicated adversarial testing low-value.

---

### 5. Auditability & Traceability

**Key question:** Can operators investigate failures, reproduce review results, and trace the decision path from input to output?

**Score:** 6/10
**Evidence:**
- Pure functions: deterministic, same input always produces same output. No randomness, no external state (Tier C: code reading).
- No logger defined — appropriate for a pure serializer with no failure modes, but limits operational visibility.
- Output structure is self-documenting: key names match domain concepts (`findings`, `score`, `verdict`, `metadata`).
- Not 7: No structured logging. No tracing of serialization decisions. Callers must infer behavior from return values.

---

### 6. Test Quality

**Key question:** Do tests verify meaningful behavior with good coverage?

**Score:** 8/10
**Evidence:**
- 18 tests across 3 test classes (Tier A: test_grippy_mcp_response.py).
- Test:source ratio of 3.27:1 (327 LOC tests / 100 LOC source).
- **Positive tests:** required keys, finding fields, score shape, verdict shape, metadata, rule findings included, enrichment serialization (Tier A).
- **Negative tests:** rule findings empty by default, diff_truncated default false (Tier A).
- **Edge case tests:** gate passed/failed, enrichment present vs absent, personality field-level stripping, profile passthrough, diff stats passthrough (Tier A).
- Calibration: matches ignore (8) — similar test:source ratio (3.27 vs 2.11) and category coverage.
- Not 9: No property-based testing. No adversarial fixture matrix (low value given no untrusted input processing).

---

### 7. Convention Adherence

**Key question:** Does the unit follow Grippy project patterns?

**Score:** 9/10
**Evidence:**
- SPDX header on both source and test file (Tier A: mcp_response.py:1, test_grippy_mcp_response.py:1).
- ruff check passes with zero issues (Tier A: static analysis).
- ruff format check passes (Tier A: static analysis).
- mypy strict passes with zero issues (Tier A: static analysis).
- Test file follows mirror structure: `src/grippy/mcp_response.py` → `tests/test_grippy_mcp_response.py` (Tier A).
- Test file exceeds 50 LOC minimum (327 LOC) (Tier A).
- Naming: snake_case for functions, `_UPPER_CASE` for module constant.
- Calibration: matches ignore (9) — both exemplary adherence.

---

### 8. Documentation Accuracy

**Key question:** Do docstrings, comments, and docs match actual behavior?

**Score:** 7/10
**Evidence:**
- Module docstring: "AI-facing response serializers for MCP tools. Strips personality fields from GrippyReview and produces dense JSON output suitable for programmatic consumption by AI agents." — accurate (Tier C: mcp_response.py:2-6).
- `_serialize_rule_finding()`: "Serialize a single deterministic rule finding." — accurate (Tier C: mcp_response.py:19).
- `serialize_scan()`: "Serialize ``scan_diff`` output (deterministic rule engine results)." — accurate (Tier C: mcp_response.py:48).
- `serialize_audit()`: "Serialize ``audit_diff`` output, stripping personality fields." — accurate (Tier C: mcp_response.py:65).
- Not 8: No usage examples. `_SEVERITY_NAMES` dict not documented beyond variable name.

---

### 9. Performance

**Key question:** Is the unit efficient for its workload?

**Score:** 9/10
**Evidence:**
- `serialize_scan()` and `serialize_audit()` are single-pass O(n) list comprehensions over findings (Tier C: code reading).
- `_serialize_rule_finding()` is O(1) per finding — fixed set of dict assignments (Tier C: code reading).
- Zero I/O, zero allocations beyond the output dicts, zero external calls.
- `_SEVERITY_NAMES` is a module-level dict lookup — O(1) (Tier C: mcp_response.py:15).
- Scored 9 (not 8) because the unit is optimal for its workload — pure dict construction with no room for improvement.
- Calibration: above ignore (8) — ignore does regex + pathspec matching; mcp-response does only dict construction.

---

### 10. Dead Code / Debt

**Key question:** Is the unit free of unused code and tracked debt?

**Score:** 9/10
**Evidence:**
- Zero `TODO` or `FIXME` comments (Tier A: grep search).
- All functions called: `serialize_scan` by mcp_server.py, `serialize_audit` by mcp_server.py, `_serialize_rule_finding` by both public functions (Tier C: caller trace).
- `_SEVERITY_NAMES` dict used by `_serialize_rule_finding` (mcp_response.py:22) (Tier C: code reading).
- All 4 severity entries in `_SEVERITY_NAMES` are reachable (RuleSeverity has 4 values: INFO=0, WARN=1, ERROR=2, CRITICAL=3) (Tier C: cross-reference with rules/base.py:17-20).
- Clean imports — ruff detects no unused imports (Tier A).

---

### 11. Dependency Hygiene

**Key question:** Are unit boundaries clean with no circular or unnecessary deps?

**Score:** 9/10
**Evidence:**
- Internal dependencies: `grippy.rules.base.RuleResult` and `grippy.schema.GrippyReview` (Tier A: import inspection at mcp_response.py:12-13).
- External dependencies: none beyond `typing` (stdlib) (Tier A: import inspection).
- No circular imports. Depends only on Phase 0 (schema) and Phase 1 (rules.base) — correct for a Phase 3 unit (Tier A: registry.yaml).
- True config leaf — consumed by mcp_server.py (Phase 4) only.
- Not 10: Two internal deps is minimal but not zero.
56 changes: 54 additions & 2 deletions tests/test_grippy_mcp_response.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@

from typing import Any

from grippy.mcp_response import serialize_audit, serialize_scan
from grippy.rules.base import RuleResult, RuleSeverity
from grippy.mcp_response import _serialize_rule_finding, serialize_audit, serialize_scan
from grippy.rules.base import ResultEnrichment, RuleResult, RuleSeverity
from grippy.schema import (
AsciiArtKey,
ComplexityTier,
Expand Down Expand Up @@ -273,3 +273,55 @@ def test_diff_stats_passthrough(self) -> None:
stats = {"files": 5, "additions": 100, "deletions": 20}
result = serialize_scan([], gate=False, profile="security", diff_stats=stats)
assert result["diff_stats"] == stats


# ---------------------------------------------------------------------------
# _serialize_rule_finding tests
# ---------------------------------------------------------------------------


class TestSerializeRuleFinding:
"""Tests for _serialize_rule_finding edge cases."""

def test_enrichment_serialized_when_present(self) -> None:
enrichment = ResultEnrichment(
blast_radius=5,
is_recurring=True,
prior_count=3,
suppressed=False,
suppression_reason="",
velocity="increasing",
)
result = _serialize_rule_finding(
RuleResult(
rule_id="SEC-010",
severity=RuleSeverity.WARN,
message="Test finding",
file="app.py",
line=42,
enrichment=enrichment,
),
)
assert "enrichment" in result
e = result["enrichment"]
assert e["blast_radius"] == 5
assert e["is_recurring"] is True
assert e["prior_count"] == 3
assert e["suppressed"] is False
assert e["suppression_reason"] == ""
assert e["velocity"] == "increasing"

def test_personality_fields_absent_from_audit(self) -> None:
"""Verify every Personality model field is stripped from serialized output."""
review = _make_review()
result = serialize_audit(review, profile="security", diff_stats={"files": 3})
flat = str(result)
# Every field from schema.Personality must be absent
for field in (
"tone_register",
"opening_catchphrase",
"closing_line",
"disguise_used",
"ascii_art_key",
):
assert field not in flat, f"Personality field '{field}' leaked into serialized output"
Loading