feat: output policy v2 — three-set model, evidence grounding, identity keys by Fieldnote-Echo · Pull Request #78 · Project-Navi/grippy-code-review

Nelson Spence (Fieldnote-Echo) · 2026-03-15T16:06:56Z

Summary

Output policy (output_policy.py): narration suppression, severity-specific confidence thresholds, evidence grounding (empty → suppress, thin → summary-only, grounded → inline), nogrip pragma support, score recomputation from scoring set, verdict derivation from mode
Three-set model: suppressed (discarded), scoring set (inline + summary-only), display set (capped). Score/verdict computed from scoring set, not display set
Identity keys: comment markers now include sha256(title:severity:category)[:8] discriminator — prevents dedup collisions for distinct findings at same file/line/category
Split telemetry: confidence_suppressed_count, evidence_suppressed_count, nogrip_suppressed_count tracked independently. Nogrip excluded from quality-judgment total
Thread resolution: summary-only findings protect their threads from auto-resolve. fetch_thread_states fixed to use stdin for JSON array variables
Dashboard: shows Findings: N total (X inline, Y summary-only) when summary-only > 0

Test plan

39 output policy tests (narration, confidence, evidence, nogrip, three-set, score, verdict, caps, telemetry, fail-open, e2e)
Nogrip acceptance test proves PR feat: output policy v2 — review-policy rewrite #77 scenario (SQL in fixtures → PASS)
Thread resolution test: summary-only findings protect threads
Identity key tests: 4-tuple markers with backward compat
Full suite: 1181 passed, 0 failed
Lint, format, mypy clean
Rule engine: 0 findings on PR diff

🤖 Generated with Claude Code

…evidence posting gate Three-set model: suppressed (narration/below-threshold/empty-evidence), scoring set (inline + summary-only), display set (capped for posting). - Narration detection: 11 phrase patterns + empty/dismissive suggestion check - Confidence threshold: severity-specific minimums with category penalty (governance/observability -15) - Evidence grounding: empty evidence suppresses; non-empty but ungrounded moves to summary-only; grounded stays inline-eligible - Delegates to parse_diff from rules/context.py — no duplicated diff parsing - Score recomputation from scoring set with per-category caps - Verdict derivation from recomputed score + mode (call-site authority) - Display caps: 5/file, 20/review, 3 LOW — caps don't affect score Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

GrippyReview gains summary_only_findings (scored but not inline-eligible). ReviewMeta gains: score_before_policy, verdict_before_policy, policy_bypassed, policy_bypass_reason, narration_suppressed_count, threshold_suppressed_count, display_capped_count. All new fields have defaults — backward compatible with existing reviews. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

retry.py: add mode/diff params to run_review(), call filter_review() after successful parse. Single fail-open owner — catches exceptions, sets policy_bypassed=True + reason, returns unfiltered review. review.py: pass mode and diff through to run_review(), pass summary_only_findings + policy meta to post_review(). mcp_server.py: pass mode and diff to run_review() in audit_diff path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github_review.py: - build_review_comment() renders evidence as fenced code block - format_summary_comment() gains summary_only_findings, policy_bypassed, display_capped_count params - Summary-only findings rendered in collapsible details section - Policy-bypassed warning annotation when filter_review() failed - Display-capped count annotation when findings omitted for brevity - post_review() passes new params through to summary mcp_response.py: - serialize_audit() includes summary_only_findings in output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New: tests/test_grippy_output_policy.py — 34 tests covering: - Narration suppression (3), confidence threshold (5), evidence grounding (4) - Three-set invariant (2), score recomputation (3), verdict derivation (4) - Display caps (2), summary-only persistence (2), telemetry (5) - Fail-open integration (1), diff helper (2), end-to-end (1) Updated: test_grippy_retry.py — score assertions updated for recomputed scores (output policy recomputes 0-finding reviews to 100). Updated: test_grippy_mcp_response.py — added summary_only_findings to expected serialize_audit() keys. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Tests for the three new format_summary_comment params: - policy_bypassed warning annotation - display_capped_count omission annotation - summary_only_findings collapsible section Tests for build_review_comment evidence rendering: - Non-empty evidence rendered as fenced code block - Whitespace-only evidence produces no code block Fixes quality-gate: github_review.py patch coverage was 25% (21 uncovered lines in new rendering paths), now covered by unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Suppresses false-positive rule engine findings on test fixture strings containing SQL injection patterns (evidence examples, diff fixtures). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Two fixes for the noise problem: 1. `# nogrip` now suppresses LLM findings (not just rule engine). The output policy parses nogrip pragmas from the diff and suppresses findings whose line range includes a nogrip line. Bare `# nogrip` suppresses all findings; targeted `# nogrip: SEC-001` only suppresses matching rule_id. 2. Stale thread resolution no longer requires `isOutdated` from GitHub. Any absent (no longer in current findings), unresolved thread is resolved. This fixes persistent stale comments that never got cleaned up because the line didn't change between pushes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ntity keys - Evidence thin-token fail-open → fail-closed: <2 meaningful tokens routes to summary-only, not inline - Split telemetry: threshold_suppressed_count → confidence_suppressed_count + evidence_suppressed_count; nogrip excluded from quality-judgment total - Thread resolution: summary-only findings protect their threads from auto-resolve - Identity key hash: markers now include sha256(title:severity:category)[:8] discriminator with conservative backward compat for old markers - Dashboard count split: shows "N total (X inline, Y summary-only)" - Nogrip acceptance test: proves PR #77 SQL fixture scenario → PASS Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…idence - fetch_thread_states: gh -F doesn't parse JSON arrays — switch to --input stdin with full query+variables payload - Add # nogrip to SQL evidence string in nogrip acceptance test to prevent rule engine false positive (the exact PR #77 failure mode) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

project-navi-bot

All required CI checks passed. Auto-approved by navi-bot.

project-navi-bot

All required CI checks passed. Auto-approved by navi-bot.

github-actions · 2026-03-15T16:11:33Z

src/grippy/github_review.py

@@ -408,6 +454,29 @@ def format_summary_comment(
        lines.append("</details>")


🟡 MEDIUM: Potential side effect: summary_only_findings may be None

Confidence: 95%

summary_only_count=len(summary_only_findings) if summary_only_findings else 0,

In function format_summary_comment, summary_only_findings is checked with 'if summary_only_findings' before using len(summary_only_findings), but if it's None (the default), len() would raise a TypeError. This could lead to exceptions if the argument is not explicitly passed.

Suggestion: Ensure summary_only_findings is always a list (never None), or update the check to assign 0 if it is None. For example, use 'len(summary_only_findings) if summary_only_findings else 0' or default to an empty list at function entry.

— If summary_only_findings is unexpectedly None, you get a TypeError. Worth hardening.

github-actions · 2026-03-15T16:11:33Z

src/grippy/github_review.py

@@ -225,19 +243,24 @@ def build_review_comment(finding: Finding) -> dict[str, str | int]:
 # --- GitHub comment fetching ---


🔵 LOW: Default param for summary_only_findings is mutable None

Confidence: 92%

summary_only_findings: list[Finding] | None = None,

The parameter summary_only_findings has a default of None in several functions ('def format_summary_comment', 'def post_review'), but is later treated as a list. Using None requires repeated None checks, and is risk-prone.

Suggestion: Default these parameters to an empty list if possible, or at the start of the function, assign summary_only_findings = summary_only_findings or [].

— Not wrong, but making the default an empty list would simplify downstream logic.

github-actions · 2026-03-15T16:11:33Z

src/grippy/schema.py

@@ -189,6 +199,7 @@ class GrippyReview(BaseModel):
    pr: PRMetadata


🔵 LOW: Possible mutable default with summary_only_findings Field

Confidence: 88%

summary_only_findings: list[Finding] = Field(default_factory=list)

The summary_only_findings field is defined as Field(default_factory=list), making the default state an empty list. This is the correct approach to avoid mutable default arguments, but be careful if mutating this list in-place in code paths.

Suggestion: Avoid mutating the summary_only_findings list in-place on the review object; always assign a fresh list unless intentionally appending.

— Use default_factory for lists-that's good-but avoid accidental shared reference pitfalls elsewhere.

github-actions · 2026-03-15T16:11:36Z

✅ Grippy Review — PASS

Score: 91/100 | Findings: 3

Delta: 3 new

_{Commit: 6f944fd}

project-navi-bot

All required CI checks passed. Auto-approved by navi-bot.

project-navi-bot

All required CI checks passed. Auto-approved by navi-bot.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6f944fd7a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-03-15T16:14:16Z

src/grippy/output_policy.py

+    if diff:
+        added_lines, nogrip_index = _parse_diff_context(diff)
+    else:
+        added_lines, nogrip_index = None, {}


Treat empty diff as provided context

In filter_review, using if diff: sends empty-string diffs down the “no diff provided” branch, which bypasses both nogrip suppression and evidence grounding. This is reachable when the caller passes an empty diff string (for example after ignore filtering removes all file hunks), so non-empty-evidence findings become inline-eligible and can still affect verdicts even though there is no added code to ground against. Check diff is not None here so an empty diff is parsed as empty context instead of fail-open inline behavior.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-15T16:14:16Z

src/grippy/output_policy.py

+    matches = sum(1 for t in tokens if t in added_text)
+    return matches >= 2


Deduplicate evidence tokens before overlap counting

_evidence_is_grounded counts token matches with sum(1 for t in tokens if t in added_text), so repeated evidence tokens are double-counted. A finding with evidence like "query query" can satisfy the >= 2 match threshold even if only one distinct token overlaps with the diff, causing thin evidence to be treated as grounded and posted inline. Count unique normalized tokens (or intersect token sets) to enforce the intended “at least two tokens” requirement.

Useful? React with 👍 / 👎.

codecov · 2026-03-15T16:33:59Z

Codecov Report

❌ Patch coverage is 96.96970% with 8 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/grippy/output_policy.py	97.15%	5 Missing ⚠️
src/grippy/github_review.py	95.58%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

project-navi-bot

All required CI checks passed. Auto-approved by navi-bot.

Nelson Spence (Fieldnote-Echo) and others added 10 commits March 15, 2026 11:06

style: add # nogrip to test fixture SQL patterns

bf3ff06

Suppresses false-positive rule engine findings on test fixture strings containing SQL injection patterns (evidence examples, diff fixtures). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Nelson Spence (Fieldnote-Echo) self-assigned this Mar 15, 2026

project-navi-bot approved these changes Mar 15, 2026

View reviewed changes

github-actions bot reviewed Mar 15, 2026

View reviewed changes

project-navi-bot approved these changes Mar 15, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Mar 15, 2026

View reviewed changes

project-navi-bot approved these changes Mar 15, 2026

View reviewed changes

		@@ -408,6 +454,29 @@ def format_summary_comment(
		lines.append("</details>")

		@@ -225,19 +243,24 @@ def build_review_comment(finding: Finding) -> dict[str, str \| int]:
		# --- GitHub comment fetching ---

		@@ -189,6 +199,7 @@ class GrippyReview(BaseModel):
		pr: PRMetadata

		matches = sum(1 for t in tokens if t in added_text)
		return matches >= 2

Conversation

Nelson Spence (Fieldnote-Echo) commented Mar 15, 2026

Summary

Test plan

Uh oh!

project-navi-bot left a comment

Choose a reason for hiding this comment

Uh oh!

project-navi-bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot Mar 15, 2026

Choose a reason for hiding this comment

🟡 MEDIUM: Potential side effect: summary_only_findings may be None

Uh oh!

github-actions bot Mar 15, 2026

Choose a reason for hiding this comment

🔵 LOW: Default param for summary_only_findings is mutable None

Uh oh!

github-actions bot Mar 15, 2026

Choose a reason for hiding this comment

🔵 LOW: Possible mutable default with summary_only_findings Field

Uh oh!

github-actions bot commented Mar 15, 2026

✅ Grippy Review — PASS

Uh oh!

project-navi-bot left a comment

Choose a reason for hiding this comment

Uh oh!

project-navi-bot left a comment

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Mar 15, 2026

Codecov Report

Uh oh!

project-navi-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants