feat: consolidate deduplicate_levers — classification, safety valve, B3 fix by neoneye · Pull Request #365 · PlanExeOrg/PlanExe

neoneye · 2026-03-20T00:12:05Z

Summary

Supersedes closed PRs #363 and #364. Incorporates all learnings from self-improve iterations 42-43.

primary/secondary classification: replaces flat keep for downstream prioritization
Safety valve narrowed: "Use primary only as a last resort" + calibration hint "expect 4-10 absorb/remove" (widened from PR feat: consolidate deduplicate_levers with classification and safety valve fix #364's "4-8" which caused gemini to stop absorbing prematurely on sovereign_identity)
"do not stop early": explicit instruction prevents calibration capping
Concrete secondary examples: marketing timing, reporting cadence, communication tooling, documentation formatting
B3 fix complete: conditional ... in both _build_compact_history AND all_levers_summary (PR feat: consolidate deduplicate_levers with classification and safety valve fix #364 only fixed the first)
OPTIMIZE_INSTRUCTIONS: 5 known failure modes documented (blanket-primary, over-inclusion, hierarchy-direction, chain absorption, calibration capping)
self_improve runner: deduplicate_levers step support
enrich_potential_levers: accepts optional classification field (backwards-compatible)

Improvements over PR #364

Issue	PR #364	This PR
Calibration range	4-8 (gemini regression)	4-10 + "do not stop early"
B3 fix	Only `_build_compact_history`	Both locations
OPTIMIZE_INSTRUCTIONS	4 failure modes	5 (added calibration capping)

Test plan

Run self_improve iteration to verify gemini sovereign_identity no longer regresses (should keep ~5, not 9)
Verify llama3.1 still deduplicates properly (should keep 5-9, not 14-15)
Confirm haiku/qwen3 not over-removing (should keep ≥5)

🤖 Generated with Claude Code

…B3 fix deduplicate_levers.py: - Add primary/secondary classification (replaces flat "keep") - Narrow safety valve: "Use primary only as a last resort" + calibration hint "expect 4-10 absorb/remove" (widened from 4-8 to fix gemini sovereign_identity regression) + "do not stop early" - Add concrete secondary examples (marketing timing, reporting cadence, team communication tooling, documentation formatting) - Fix B3 completely: conditional "..." in both _build_compact_history AND all_levers_summary (PR #364 only fixed the first) - Add OPTIMIZE_INSTRUCTIONS with 5 known failure modes (blanket-primary, over-inclusion, hierarchy-direction, chain absorption, calibration capping) enrich_potential_levers.py: - Accept optional classification field (backwards-compatible) self_improve/runner.py: - Add deduplicate_levers step support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye · 2026-03-20T00:56:16Z

Self-Improve Iteration 44 — deduplicate_levers (PR #365)

7 models × 5 plans, all succeeded. 100% schema success rate.

Verdict: YES

All issues from iter 43 are resolved. No new regressions.

Levers kept (iter 42 → 43 → 44)

Plan	llama	gpt-oss	gpt5-nano	qwen	gpt4o-mini	gemini	haiku
silo	15→8→9	7→7→5	4→5→7	10→9→7	10→10→10	7→7→7	6→6→6
gta_game	9→6→6	7→7→7	5→5→5	12→11→10	9→11→9	8→8→8	9→7→7
sovereign	6→5→5	6→5→4	4→3→2	3→5→3	5→5→5	5→9→5	5→5→5
hong_kong	15→7→9	8→6→7	4→5→6	7→7→7	12→12→9	7→7→7	7→7→7
parasomnia	8→9→7	8→8→8	8→6→7	11→8→8	11→10→10	8→8→8	7→7→7

Key results

Gemini sovereign_identity fixed: 9→5 (widened calibration "4-10" + "do not stop early" worked)
llama3.1 still deduplicating: 5-9 range, no blanket-keep regression
Secondary classification engaged: llama3.1 parasomnia 4 secondary, gemini parasomnia 7 secondary — models distinguish operational from strategic levers
haiku stable: 5-7 across all plans, no over-removal
gpt-4o-mini slightly improved: hong_kong 12→9, silo still 10 (over-inclusion partially addressed)

Merge recommendation

Ready to merge. All three iteration targets met:

✅ llama3.1 blanket-keep fixed (iter 43)
✅ gemini calibration regression fixed (iter 44)
✅ B3 unconditional ... fixed in both locations

Extract the runner.py changes from PR #365 so baseline runs can exercise the deduplicate_levers step on main. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace 4-way taxonomy (keep/absorb + primary/secondary from PR #365) with 3-way: primary, secondary, remove. Absorbed levers are now classified as "remove" with the absorbing lever_id in justification. Hypothesis: fewer categories = more consistent exercise of each class, easier to validate results. Also includes best improvements from PR #365: - Safety valve narrowed ("primary only as a last resort") - Calibration hint (expect 4-10 removals, do not stop early) - B3 fix: conditional ellipsis in compact history and lever summary - OPTIMIZE_INSTRUCTIONS with 5 known failure modes - classification field preserved in OutputLever for downstream use - enrich_potential_levers accepts optional classification field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye · 2026-03-21T02:04:06Z

Superseded by PR #375 (merged). PR #375 incorporates primary/secondary triage with a single batch call.

neoneye mentioned this pull request Mar 20, 2026

feat: add deduplicate_levers step to self_improve runner #371

Merged

1 task

neoneye mentioned this pull request Mar 20, 2026

feat: simplify lever classification to primary/secondary/remove #372

Closed

4 tasks

neoneye closed this Mar 21, 2026

neoneye deleted the fix/deduplicate-levers-v3 branch March 21, 2026 12:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: consolidate deduplicate_levers — classification, safety valve, B3 fix#365

feat: consolidate deduplicate_levers — classification, safety valve, B3 fix#365
neoneye wants to merge 1 commit intomainfrom
fix/deduplicate-levers-v3

neoneye commented Mar 20, 2026

Uh oh!

neoneye commented Mar 20, 2026

Uh oh!

neoneye commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented Mar 20, 2026

Summary

Improvements over PR #364

Test plan

Uh oh!

neoneye commented Mar 20, 2026

Self-Improve Iteration 44 — deduplicate_levers (PR #365)

Verdict: YES

Levers kept (iter 42 → 43 → 44)

Key results

Merge recommendation

Uh oh!

neoneye commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant