feat: simplify lever classification to primary/secondary/remove#372
feat: simplify lever classification to primary/secondary/remove#372
Conversation
Replace 4-way taxonomy (keep/absorb + primary/secondary from PR #365) with 3-way: primary, secondary, remove. Absorbed levers are now classified as "remove" with the absorbing lever_id in justification. Hypothesis: fewer categories = more consistent exercise of each class, easier to validate results. Also includes best improvements from PR #365: - Safety valve narrowed ("primary only as a last resort") - Calibration hint (expect 4-10 removals, do not stop early) - B3 fix: conditional ellipsis in compact history and lever summary - OPTIMIZE_INSTRUCTIONS with 5 known failure modes - classification field preserved in OutputLever for downstream use - enrich_potential_levers accepts optional classification field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
deduplicate_levers was storing the serialized input levers in user_prompt instead of the project context. This made the saved raw output misleading — other pipeline steps store the actual user prompt in this field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Iteration 49 Results & Next StepsVerdict: KEEP — 35/35 runs succeeded, all 3 categories exercised (58% primary, 27% secondary, 15% remove). Key wins over main (iter 48)
Cross-iteration ranking (iter 45 vs 48 vs 49)Iter 49 (this PR) ranked best. The 4-way taxonomy in PR #365 has a dead category — Remaining issues (pre-existing, not introduced by this PR)
Architectural directionThe bigger improvement is moving from 18 sequential LLM calls per plan to 1 batch call. The per-lever approach causes position bias, prevents global consistency, and is 18× more expensive. Plan to implement this as a follow-up PR on top of this one — same 3-way taxonomy, same output schema, just restructured as a single call. Full analysis: PlanExe-prompt-lab/analysis/49_deduplicate_levers/ |
Summary
removewith the absorbing lever_id stated in the justification — no separateabsorbcategoryChanges from main (keep/absorb/remove → primary/secondary/remove)
LeverClassificationenum:keep→primary+secondary,absorbmerged intoremoveOutputLevernow includes aclassificationfield (primary or secondary) for downstream use...in both_build_compact_historyandall_levers_summaryOPTIMIZE_INSTRUCTIONSblock documents 5 known failure modes for self-improve analysisenrich_potential_levers.py: accepts optionalclassificationfield (backward-compatible)Bug fixes from iter 45 code review
user_promptfield now storesproject_contextinstead of serialized levers JSON — consistent with all other pipeline stepsTest plan
deduplicate_leversstep via self-improve runner against snapshot input🤖 Generated with Claude Code