Conversation
…of being overridden by them
Observed failure: user set global rule "Always commit and push if asked!"
in the supervision defaults. A session hit idle with uncommitted work;
user asked "提交了么?" (did you commit?), agent answered "还没提交" (no).
Supervisor returned `complete` — rule was never enforced.
Root cause — two heuristics in the decision-prompt rule list were
structurally able to defeat any user rule:
1. "A factual answer to a user question ... is typically complete for
that turn; the user asked a question, the agent answered it. Do not
treat state reports as proposed work."
2. "A user-set supervision rule phrased conditionally ('if asked',
'when X') is conditional. Check whether the condition actually
fires in the current turn before using it to justify continue."
The arbiter LLM took "Always commit and push if asked!" at heuristic #2's
narrowest reading ("the user didn't literally command 'commit it' this
turn → condition didn't fire") and combined it with heuristic #1 to
justify `complete` on the Q-and-A turn. Result: the user's enforce-this
rule was silently downgraded to "advice the arbiter may ignore".
Fix — reorder and rewrite:
- New top-of-list clause: "USER-SET SUPERVISION RULES ARE AUTHORITATIVE."
This is the first decision rule the arbiter reads. It says the user-
rules block overrides the generic heuristics below it, gives concrete
worked examples for:
* commit/push rules (matches the current failure mode verbatim)
* blanket wording ("always", "每次", "必须", "绝不") → unconditional
* conditional wording ("if asked", "when X", "如果", "当") →
interpret GENEROUSLY in the user's favor: the topic appearing in
the conversation IS the condition firing.
- Heuristic #1 ("factual Q&A → complete") now explicitly reads
"typically complete for that turn IF no user-set rule applies" — so
it still covers ordinary questions but stops poaching turns that a
user rule governs.
- Heuristic #2 (the conditional-rule escape hatch) is removed; its
responsibility is folded into the authoritative clause, which now
owns all conditional-rule handling from the user-rules-always-win
side.
- Repair prompt mirrors the same clause so JSON-invalid fallbacks
can't drop back into the old behavior.
All 71 existing supervision prompt / config / broker tests stay green.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.