feat: simplify lever classification to primary/secondary/remove by neoneye · Pull Request #372 · PlanExeOrg/PlanExe

neoneye · 2026-03-20T18:49:22Z

Summary

Replaces the 4-way taxonomy (PR feat: consolidate deduplicate_levers — classification, safety valve, B3 fix #365's primary/secondary/absorb/remove) with a simpler 3-way classification: primary, secondary, remove
Absorbed levers are now classified as remove with the absorbing lever_id stated in the justification — no separate absorb category
Hypothesis: fewer categories means each class gets exercised more, making it easier to validate whether the model is using them correctly

Changes from main (keep/absorb/remove → primary/secondary/remove)

LeverClassification enum: keep → primary + secondary, absorb merged into remove
OutputLever now includes a classification field (primary or secondary) for downstream use
System prompt rewritten with primary/secondary definitions, concrete secondary examples, calibration hint (expect 4–10 removals), and "do not stop early" instruction
Safety valve narrowed: "use primary only as a last resort" (replaces blanket "use keep if unsure")
B3 fix: conditional ... in both _build_compact_history and all_levers_summary
OPTIMIZE_INSTRUCTIONS block documents 5 known failure modes for self-improve analysis
enrich_potential_levers.py: accepts optional classification field (backward-compatible)

Bug fixes from iter 45 code review

B1 fix: user_prompt field now stores project_context instead of serialized levers JSON — consistent with all other pipeline steps

Test plan

Run deduplicate_levers step via self-improve runner against snapshot input
Verify all 7 models produce valid primary/secondary/remove classifications
Compare removal counts against iter 48 (main baseline) — expect similar or better consolidation
Check that no model produces zero removals (blanket-primary failure mode)

🤖 Generated with Claude Code

Replace 4-way taxonomy (keep/absorb + primary/secondary from PR #365) with 3-way: primary, secondary, remove. Absorbed levers are now classified as "remove" with the absorbing lever_id in justification. Hypothesis: fewer categories = more consistent exercise of each class, easier to validate results. Also includes best improvements from PR #365: - Safety valve narrowed ("primary only as a last resort") - Calibration hint (expect 4-10 removals, do not stop early) - B3 fix: conditional ellipsis in compact history and lever summary - OPTIMIZE_INSTRUCTIONS with 5 known failure modes - classification field preserved in OutputLever for downstream use - enrich_potential_levers accepts optional classification field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

deduplicate_levers was storing the serialized input levers in user_prompt instead of the project context. This made the saved raw output misleading — other pipeline steps store the actual user prompt in this field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

neoneye · 2026-03-20T20:32:50Z

Iteration 49 Results & Next Steps

Verdict: KEEP — 35/35 runs succeeded, all 3 categories exercised (58% primary, 27% secondary, 15% remove).

Key wins over main (iter 48)

Degenerate llama3.1 collapse eliminated (7 levers into "Risk Framing" → gone)
Haiku remove rate improved 28% → 39%
Primary/secondary triage adds downstream signal that keep-only lacks
All remove justifications cite absorbing lever UUID (77% full UUID, 87% excl. qwen3)

Cross-iteration ranking (iter 45 vs 48 vs 49)

Iter 49 (this PR) ranked best. The 4-way taxonomy in PR #365 has a dead category — remove is never used when absorb exists. With 3 categories, all three get exercised. Absorb-info isn't consumed downstream, so a separate absorb category adds complexity without benefit.

Remaining issues (pre-existing, not introduced by this PR)

B3: Template-lock in secondary definition — llama3.1 copies it verbatim, producing 0 removes on sovereign_identity
B2: Contradictory primary fallback instruction
B1: partial_recovery threshold fires on normal 2-call runs in runner.py

Architectural direction

The bigger improvement is moving from 18 sequential LLM calls per plan to 1 batch call. The per-lever approach causes position bias, prevents global consistency, and is 18× more expensive. Plan to implement this as a follow-up PR on top of this one — same 3-way taxonomy, same output schema, just restructured as a single call.

Full analysis: PlanExe-prompt-lab/analysis/49_deduplicate_levers/

neoneye · 2026-03-21T02:03:58Z

Superseded by PR #375 (merged). PR #375 combines the batch architecture with categorical taxonomy and prompt fixes.

neoneye and others added 2 commits March 20, 2026 19:49

This was referenced Mar 20, 2026

feat: single-call Likert scoring for deduplicate_levers #373

Merged

feat: batch categorical dedup — single call + primary/secondary/remove #374

Merged

neoneye closed this Mar 21, 2026

neoneye deleted the simplify-lever-classification branch March 21, 2026 02:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: simplify lever classification to primary/secondary/remove#372

feat: simplify lever classification to primary/secondary/remove#372
neoneye wants to merge 2 commits intomainfrom
simplify-lever-classification

neoneye commented Mar 20, 2026 •

edited

Loading

Uh oh!

neoneye commented Mar 20, 2026

Uh oh!

neoneye commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes from main (keep/absorb/remove → primary/secondary/remove)

Bug fixes from iter 45 code review

Test plan

Uh oh!

neoneye commented Mar 20, 2026

Iteration 49 Results & Next Steps

Key wins over main (iter 48)

Cross-iteration ranking (iter 45 vs 48 vs 49)

Remaining issues (pre-existing, not introduced by this PR)

Architectural direction

Uh oh!

neoneye commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neoneye commented Mar 20, 2026 •

edited

Loading