Skip to content

Conversation

@TechNickAI
Copy link
Owner

Summary

  • Shifts triage philosophy from "skip things that aren't blocking" to "only decline when the bot is wrong"
  • Addresses feedback that skipping "minor" issues just defers work to humans
  • Follows prompt-engineering.mdc principles (positive framing, no anti-patterns shown)

Changes

address-pr-comments.md (v1.4.0 → v1.5.0):

  • Reworded objective to emphasize identifying incorrect suggestions, not prioritizing
  • Simplified triage criteria to single question: "Is the bot's analysis correct?"
  • Added guidance for partial correctness (bot identifies real issue, suggests wrong fix)
  • Reframed hotfix-mode with positive language
  • Removed gh pr checks --watch anti-pattern mention

code-review-standards.mdc (v1.0.0 → v1.1.0):

  • Reframed core philosophy around correctness, not priority
  • Simplified race condition and optimization sections
  • Clarified accessibility as the one valid "flag for user" case (scope decision)
  • Removed negative framing throughout

Test Plan

  • Run /address-pr-comments on a PR with mixed feedback to verify triage behavior
  • Verify declines include reasoning about why bot is wrong (not just "minor")
  • Check that valid suggestions get addressed regardless of size

🤖 Generated with Claude Code

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@claude
Copy link

claude bot commented Dec 28, 2025

PR Review: Correctness-Based Triage Refactor

Summary

This is an excellent refactor that fundamentally improves the philosophy of PR comment triage. The shift from "skip minor issues" to "only decline when the bot is wrong" is both principled and practical.

Strengths

1. Philosophical Clarity

The core insight is brilliant: the distinction between "bot is correct but inconvenient" vs "bot's analysis is incorrect given context." This creates a clear decision framework:

  • Before: "Is this blocking? Is it important enough?" (subjective priority)
  • After: "Is the bot's analysis correct given context I have?" (objective correctness)

This aligns with the prompt-engineering principle of positive framing - focusing on what TO do (identify incorrect suggestions) rather than what NOT to do (skip low-priority items).

2. Prompt Engineering Adherence

The changes follow prompt-engineering.mdc principles exceptionally well:

  • Positive framing: "Address when bot's analysis is correct" vs "Skip these issues"
  • No anti-patterns shown: Removed the gh pr checks --watch example
  • Goal-focused: Describes outcomes ("identify incorrect suggestions") rather than prescribing process
  • Consistent terminology: Uses "bot's analysis is correct/incorrect" throughout

3. Practical Examples

The updated patterns are much clearer:

code-review-standards.mdc:

  • Single-Use Values: Explains WHY extraction is wrong (no DRY benefit)
  • Theoretical Race Conditions: Explains the context bots can't see (queue/mutex)
  • Each pattern includes the reasoning, not just the rule

address-pr-comments.md:

  • Triage examples show correct reasoning: "Bot wants constant extraction, but this value appears once"
  • Emphasizes explaining WHY bot is wrong, not just declining

4. Accessibility Nuance

The accessibility section (lines 64-70 in code-review-standards.mdc) is particularly well-handled:

"This is the one case where 'flag for user' is appropriate because it's a scope/priority call, not because the suggestion is wrong."

This acknowledges that some decisions genuinely require product judgment while maintaining the correctness-first framework.

Minor Observations

1. Partial Correctness Pattern (address-pr-comments.md:117-118)

This is excellent:

"When a bot correctly identifies an issue but suggests a suboptimal fix, address the underlying issue with the appropriate solution. Credit the bot for the correct diagnosis."

This nuance wasn't in the old version and is valuable.

2. Hotfix Mode (lines 48-58)

The hotfix section maintains the expedited philosophy while staying aligned with correctness framing. Good balance between speed and principle.

3. Version Bumps

  • code-review-standards.mdc: 1.0.0 → 1.1.0 ✅
  • address-pr-comments.md: 1.4.0 → 1.5.0 ✅

Appropriate semantic versioning for this refactor.

Potential Considerations

1. Edge Case: Disagreement on Correctness

What happens when the LLM genuinely believes the bot is correct, but the user disagrees? The current framework doesn't explicitly address this. Consider adding guidance like:

"If you're uncertain whether the bot's analysis is correct, err on the side of addressing it and note your uncertainty in the response."

However, this may be overthinking it - the LLM can already flag for user attention.

2. Training Signal Consistency

The narration examples (lines 103-106) show reactions:

  • Heart reaction for "real bug"
  • Thumbs down for "magic string extraction"
  • Rocket for "SQL injection"

This is consistent with the <feedback-as-training> section. Good alignment.

Code Quality Assessment

  • Clarity: The new framing is clearer and less ambiguous
  • Consistency: Terminology is consistent throughout both files
  • Maintainability: Easier to extend with new patterns
  • Alignment: Follows project's prompt-engineering principles
  • Documentation: Changes are well-explained in PR description

Security & Performance

No security concerns. No performance implications (these are prompt files).

Test Coverage

The test plan in the PR description is appropriate:

  • Run on PR with mixed feedback ✅
  • Verify declines include reasoning ✅
  • Check valid suggestions get addressed ✅

Final Assessment

Recommendation: Approve and Merge 🚀

This PR represents a meaningful philosophical improvement. The shift from priority-based to correctness-based triage:

  1. Reduces ambiguity in decision-making
  2. Prevents deferring valid work to humans
  3. Aligns with the project's prompt engineering principles
  4. Maintains practical escape hatches (hotfix mode, accessibility scope decisions)

The implementation is clean, well-documented, and follows the project's conventions. No blocking issues identified.


Notable Quote:

"Valid declines explain why the bot's analysis is incorrect, not why addressing the issue is inconvenient."

This single sentence captures the entire philosophy shift perfectly.

🤖 Generated with Claude Code

The previous triage logic allowed declining suggestions because they were "minor"
or "not blocking" - essentially deferring valid work to humans. This shift
reframes the criteria: address all valid suggestions, decline only when you can
articulate why the bot's analysis is incorrect given context.

Key changes:
- Objective now emphasizes identifying incorrect suggestions, not prioritizing
- Triage criteria simplified to "is the bot's analysis correct?"
- Added guidance for partial correctness (bot right about problem, wrong fix)
- Reframed negative language to positive per prompt-engineering.mdc
- Clarified accessibility as the one case where scope decisions defer to user
- Removed anti-pattern mentions that LLMs might reproduce

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@TechNickAI TechNickAI force-pushed the fix-address-pr-comments-triage-logic branch from bf5db62 to 7c27e1a Compare December 28, 2025 23:39
@claude
Copy link

claude bot commented Dec 28, 2025

Code Review: Correctness-Based PR Comment Triage

This PR successfully reframes PR comment triage from priority-based to correctness-based evaluation. The changes align well with the prompt-engineering.mdc principles and improve clarity for LLM execution.

Code Quality & Best Practices

Strengths:

  • Positive framing throughout: Follows prompt-engineering.mdc guidance by using descriptive language ("Address the suggestion when...") instead of directive ("MUST address...")
  • Clear philosophical shift: The new core philosophy in both files is unambiguous - address valid feedback, decline incorrect analysis
  • Consistent terminology: Both files now use parallel language about "bot's analysis is correct/incorrect given full context"
  • Removed anti-patterns: Eliminated the gh pr checks --watch negative example per prompt-engineering.mdc principles

address-pr-comments.md improvements:

  • Line 10-20: Excellent reframing of the objective - the new language makes it clear this is about correctness validation, not priority filtering
  • Line 112-131: The triage-process section is much clearer with the single question approach ("Is this suggestion correct given context I have?")
  • Line 49-58: Hotfix mode rewritten with positive language while maintaining urgency

code-review-standards.mdc improvements:

  • Line 9-16: Core philosophy is now correctly focused on correctness of analysis vs. convenience of addressing
  • Line 23-28: Single-use values section provides clear reasoning ("Constants exist to stay DRY across multiple uses")
  • Line 49-69: New section structure with "When Bot Suggestions Don't Apply" vs "Items Requiring Case-by-Case Judgment" is clearer

Potential Issues

Minor concern in code-review-standards.mdc:

  • Line 65-69: Accessibility section now says "flag for user decision rather than assuming either way" but doesn't specify HOW to flag (comment? ask? abort?). The command file doesn't reference this pattern anywhere. Consider adding guidance on the mechanism for flagging.

Consistency check:

  • The command file (line 18-19) says to read code-review-standards.mdc to "identify incorrect suggestions" but then says to "always explain WHY the bot is wrong in this specific case." The word "always" could be read as over-prescriptive per prompt-engineering.mdc's "Goals Over Process" principle. Consider: "explain why the bot is wrong in this specific case" (removing "always").

Performance Considerations

  • No performance concerns. The changes are purely documentation/prompt refinements.
  • The adaptive polling strategy (lines 89-92) remains well-designed for different bot speeds.

Security Concerns

  • No security issues identified.
  • The hotfix mode (lines 48-58) appropriately prioritizes security vulnerabilities.

Test Coverage

Observation: These are prompt/rule files with no associated automated tests. This is expected for LLM prompt engineering content.

Recommendation for validation: The PR description includes a test plan with manual validation steps. I'd suggest adding one more checkpoint:

- [ ] Test with a PR where a bot suggests a change that IS valid but seems "minor" - ensure it gets addressed rather than declined as "not blocking"

This would validate the key behavioral shift from priority-based to correctness-based triage.

Adherence to Project Conventions

Excellent adherence to:

  • prompt-engineering.mdc (positive framing, no anti-patterns, goal-focused)
  • Version bumps: 1.0.0→1.1.0 for code-review-standards.mdc, 1.4.0→1.5.0 for address-pr-comments.md

Git conventions:

  • Commit message format follows the repo standard
  • Uses ♻️ emoji appropriately for refactoring
  • Subject line is descriptive and under 72 characters

Summary

This is a high-quality refactoring that successfully shifts the triage philosophy while following the project's prompt engineering best practices. The changes make the LLM prompts clearer and more likely to produce correct behavior.

Recommended actions before merge:

  1. Clarify the "flag for user decision" mechanism for accessibility in code-review-standards.mdc
  2. Consider removing "always" from line 19 of address-pr-comments.md for consistency with goal-focused prompting
  3. Add the suggested test plan checkpoint about validating the priority→correctness shift

Overall assessment: ✅ Approve with minor suggestions

- Remove 'always' from line 19 per prompt-engineering goals-over-process
- Clarify 'flag for user' mechanism for accessibility decisions
@claude
Copy link

claude bot commented Dec 28, 2025

PR Review: Correctness-Based Triage Philosophy

This is an excellent evolution of the PR comment triage system. The shift from priority-based to correctness-based evaluation aligns well with the heart-centered AI philosophy and prompt engineering best practices defined in this repo.

Strengths

1. Philosophy Alignment
The reframing from "skip minor issues" to "only decline when bot is wrong" is fundamentally sound. This shifts responsibility appropriately:

  • Before: AI makes priority judgments on behalf of user ("this isn't important enough")
  • After: AI evaluates correctness and leaves scope decisions to user

This respects user autonomy while leveraging AI's contextual understanding.

2. Prompt Engineering Best Practices
The changes follow prompt-engineering.mdc principles excellently:

  • ✅ Positive framing: "Address when bot's analysis is correct" vs "Skip these issues"
  • ✅ Removed anti-patterns: Eliminated gh pr checks --watch mention (line 97)
  • ✅ Goal-focused: Simplified triage to single question "Is the bot correct?"
  • ✅ Clear examples: Each decline pattern shows why the bot is wrong, not just what to skip

3. Practical Guidance
The updated triage criteria are actionable and concrete:

  • Single-use values: Clear DRY principle application
  • Theoretical race conditions: Requires demonstration of actual issue
  • Redundant type safety: Runtime validation trumps compile-time perfection
  • Premature optimization: Data-driven decisions

4. Accessibility Handling
The special case for accessibility (code-review-standards.mdc:64-69) is well-reasoned:

"Check project or user configuration for the team's stance. If no stance is declared, present the suggestion to the user..."

This correctly identifies accessibility as a scope/priority decision rather than a correctness issue.

Concerns & Suggestions

1. Potential Over-Addressing Risk

The new philosophy could lead to addressing every technically correct suggestion, even trivial ones. Consider this scenario:

Bot: "Consider extracting this repeated string literal 'user-profile' that appears twice"

Under the new criteria:

  • Bot is technically correct (value appears twice)
  • Not a single-use value exception
  • Therefore: address it

But this might be exactly the kind of low-value work the old system aimed to skip. The old "90% coverage, not 100%" philosophy had merit.

Suggestion: Consider adding a third category between "address" and "decline":

<triage-outcomes>
1. Address: Bot is correct and fix provides clear value
2. Defer: Bot is correct but impact is trivial (note in comment, don't block merge)  
3. Decline: Bot's analysis is incorrect given context
</triage-outcomes>

This preserves the "correctness-based" philosophy while acknowledging that not all correct suggestions are equally valuable.

2. Documentation Inconsistency

README.md:106 still describes the old behavior:

"fixes real issues, declines nitpicks"

This contradicts the new philosophy where "nitpick" is no longer a valid decline reason.

Recommendation: Update README to reflect correctness-based approach:

/address-pr-comments - Triage bot feedback: address valid suggestions, decline when bot's analysis is incorrect given context you have

3. Hotfix Mode Clarity

The hotfix mode description (address-pr-comments.md:48-58) maintains priority-based language:

"Focus on security vulnerabilities and bugs that could break production"

This seems intentionally different from the main flow, which is good. But the transition could be clearer.

Suggestion: Explicitly state that hotfix mode is an exception to the standard correctness-based approach:

<hotfix-mode>
If the branch name starts with `hotfix/`, temporarily switch from correctness-based to priority-based triage:

- Address: Security vulnerabilities and bugs that could break production
- Decline: All other suggestions, even if technically correct
- Reason: Emergency context overrides normal thoroughness standards

[rest of existing content...]
</hotfix-mode>

4. Edge Case: Partially Correct Bots

Lines 117-118 handle this well:

"When a bot correctly identifies an issue but suggests a suboptimal fix, address the underlying issue with the appropriate solution."

But this could use an example in the triage-process section:

Example: Bot correctly identifies that error handling is missing (correct diagnosis) 
but suggests a generic try-catch (wrong prescription). Fix: Add specific error 
handling for the actual failure modes.

5. Version Bumps

Both files correctly bumped versions:

  • code-review-standards.mdc: 1.0.0 → 1.1.0
  • address-pr-comments.md: 1.4.0 → 1.5.0

These follow semantic versioning appropriately (minor version for backward-compatible enhancements).

Testing Recommendations

The test plan in the PR description is good:

  • Run /address-pr-comments on PR with mixed feedback
  • Verify declines include reasoning about why bot is wrong
  • Check valid suggestions get addressed regardless of size

Additional test case: Run on a PR with many low-impact but technically correct suggestions (e.g., 20 single-character variable renames for consistency). Verify the behavior matches expectations - if it addresses all 20, is that desired?

Code Quality Assessment

Readability: Excellent. The positive framing makes the prompts clearer and easier to follow.

Maintainability: Improved. The simplified decision tree (one question vs multiple priority buckets) is easier to reason about.

Consistency: Strong adherence to prompt-engineering.mdc and heart-centered-ai-philosophy.mdc.

Security: No concerns.

Final Verdict

Approve with minor suggestions. This is a well-reasoned improvement that makes the triage philosophy more coherent and easier to apply. The shift to correctness-based evaluation is fundamentally sound.

Recommended follow-up work:

  1. Update README.md to reflect new philosophy
  2. Monitor real-world usage to see if the "address everything correct" approach leads to over-work
  3. Consider adding "defer" as a third outcome if addressing trivial-but-correct suggestions becomes burdensome

The philosophical shift here is significant and positive. It transforms the tool from "AI decides what matters" to "AI evaluates correctness, user decides scope" - which is exactly right.

Great work! 🎯

@TechNickAI
Copy link
Owner Author

Addressed both minor suggestions:

  1. ✅ Removed "always" from line 19 - less prescriptive per prompt-engineering.mdc
  2. ✅ Clarified accessibility "flag for user" mechanism - now specifies to present with context and ask

Thanks for the thorough reviews! The philosophical shift from priority-based to correctness-based triage is exactly what was needed.

@TechNickAI TechNickAI merged commit 4c57c50 into main Dec 28, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants