Skip to content

feat: consolidate deduplicate_levers — classification, safety valve, B3 fix#365

Closed
neoneye wants to merge 1 commit intomainfrom
fix/deduplicate-levers-v3
Closed

feat: consolidate deduplicate_levers — classification, safety valve, B3 fix#365
neoneye wants to merge 1 commit intomainfrom
fix/deduplicate-levers-v3

Conversation

@neoneye
Copy link
Member

@neoneye neoneye commented Mar 20, 2026

Summary

Supersedes closed PRs #363 and #364. Incorporates all learnings from self-improve iterations 42-43.

  • primary/secondary classification: replaces flat keep for downstream prioritization
  • Safety valve narrowed: "Use primary only as a last resort" + calibration hint "expect 4-10 absorb/remove" (widened from PR feat: consolidate deduplicate_levers with classification and safety valve fix #364's "4-8" which caused gemini to stop absorbing prematurely on sovereign_identity)
  • "do not stop early": explicit instruction prevents calibration capping
  • Concrete secondary examples: marketing timing, reporting cadence, communication tooling, documentation formatting
  • B3 fix complete: conditional ... in both _build_compact_history AND all_levers_summary (PR feat: consolidate deduplicate_levers with classification and safety valve fix #364 only fixed the first)
  • OPTIMIZE_INSTRUCTIONS: 5 known failure modes documented (blanket-primary, over-inclusion, hierarchy-direction, chain absorption, calibration capping)
  • self_improve runner: deduplicate_levers step support
  • enrich_potential_levers: accepts optional classification field (backwards-compatible)

Improvements over PR #364

Issue PR #364 This PR
Calibration range 4-8 (gemini regression) 4-10 + "do not stop early"
B3 fix Only _build_compact_history Both locations
OPTIMIZE_INSTRUCTIONS 4 failure modes 5 (added calibration capping)

Test plan

  • Run self_improve iteration to verify gemini sovereign_identity no longer regresses (should keep ~5, not 9)
  • Verify llama3.1 still deduplicates properly (should keep 5-9, not 14-15)
  • Confirm haiku/qwen3 not over-removing (should keep ≥5)

🤖 Generated with Claude Code

…B3 fix

deduplicate_levers.py:
- Add primary/secondary classification (replaces flat "keep")
- Narrow safety valve: "Use primary only as a last resort" + calibration
  hint "expect 4-10 absorb/remove" (widened from 4-8 to fix gemini
  sovereign_identity regression) + "do not stop early"
- Add concrete secondary examples (marketing timing, reporting cadence,
  team communication tooling, documentation formatting)
- Fix B3 completely: conditional "..." in both _build_compact_history
  AND all_levers_summary (PR #364 only fixed the first)
- Add OPTIMIZE_INSTRUCTIONS with 5 known failure modes (blanket-primary,
  over-inclusion, hierarchy-direction, chain absorption, calibration
  capping)

enrich_potential_levers.py:
- Accept optional classification field (backwards-compatible)

self_improve/runner.py:
- Add deduplicate_levers step support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@neoneye
Copy link
Member Author

neoneye commented Mar 20, 2026

Self-Improve Iteration 44 — deduplicate_levers (PR #365)

7 models × 5 plans, all succeeded. 100% schema success rate.

Verdict: YES

All issues from iter 43 are resolved. No new regressions.

Levers kept (iter 42 → 43 → 44)

Plan llama gpt-oss gpt5-nano qwen gpt4o-mini gemini haiku
silo 15→8→9 7→7→5 4→5→7 10→9→7 10→10→10 7→7→7 6→6→6
gta_game 9→6→6 7→7→7 5→5→5 12→11→10 9→11→9 8→8→8 9→7→7
sovereign 6→5→5 6→5→4 4→3→2 3→5→3 5→5→5 5→9→5 5→5→5
hong_kong 15→7→9 8→6→7 4→5→6 7→7→7 12→12→9 7→7→7 7→7→7
parasomnia 8→9→7 8→8→8 8→6→7 11→8→8 11→10→10 8→8→8 7→7→7

Key results

  1. Gemini sovereign_identity fixed: 9→5 (widened calibration "4-10" + "do not stop early" worked)
  2. llama3.1 still deduplicating: 5-9 range, no blanket-keep regression
  3. Secondary classification engaged: llama3.1 parasomnia 4 secondary, gemini parasomnia 7 secondary — models distinguish operational from strategic levers
  4. haiku stable: 5-7 across all plans, no over-removal
  5. gpt-4o-mini slightly improved: hong_kong 12→9, silo still 10 (over-inclusion partially addressed)

Merge recommendation

Ready to merge. All three iteration targets met:

  • ✅ llama3.1 blanket-keep fixed (iter 43)
  • ✅ gemini calibration regression fixed (iter 44)
  • ✅ B3 unconditional ... fixed in both locations

neoneye added a commit that referenced this pull request Mar 20, 2026
Extract the runner.py changes from PR #365 so baseline runs can
exercise the deduplicate_levers step on main.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
neoneye added a commit that referenced this pull request Mar 20, 2026
Replace 4-way taxonomy (keep/absorb + primary/secondary from PR #365)
with 3-way: primary, secondary, remove. Absorbed levers are now
classified as "remove" with the absorbing lever_id in justification.

Hypothesis: fewer categories = more consistent exercise of each class,
easier to validate results.

Also includes best improvements from PR #365:
- Safety valve narrowed ("primary only as a last resort")
- Calibration hint (expect 4-10 removals, do not stop early)
- B3 fix: conditional ellipsis in compact history and lever summary
- OPTIMIZE_INSTRUCTIONS with 5 known failure modes
- classification field preserved in OutputLever for downstream use
- enrich_potential_levers accepts optional classification field

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@neoneye
Copy link
Member Author

neoneye commented Mar 21, 2026

Superseded by PR #375 (merged). PR #375 incorporates primary/secondary triage with a single batch call.

@neoneye neoneye closed this Mar 21, 2026
@neoneye neoneye deleted the fix/deduplicate-levers-v3 branch March 21, 2026 12:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant