Skip to content

Calibrate accept confidence fallback#110

Merged
yha9806 merged 1 commit intomasterfrom
codex/tiny-confidence-calibration-v1
May 7, 2026
Merged

Calibrate accept confidence fallback#110
yha9806 merged 1 commit intomasterfrom
codex/tiny-confidence-calibration-v1

Conversation

@yha9806
Copy link
Copy Markdown
Contributor

@yha9806 yha9806 commented May 7, 2026

Summary

  • allow low-confidence accepted decisions to stay with tiny_model when required source context is available
  • keep explicit fallback_to_agent and missing-source fallbacks unchanged
  • update training-effectiveness and residual-audit expectations after the tiny-router confidence gap is removed

Verified effect

  • public source-context dry-run fallback: 21 -> 6 (reduction 15)
  • private recovery dry-run fallback: 6 -> 4
  • recovered no_source_context_for_required_source: 2 -> 0
  • residual audit tiny_router_candidate_count: 0
  • residual fallback_agent_count: 4

Verification

  • PYTHONPATH=src python3 -m pytest tests/test_dry_run_decision_router.py::test_dry_run_decisions_do_not_fallback_accepted_cases_with_source_context tests/test_training_effectiveness_report.py::test_training_effectiveness_report_compares_source_context_router_improvement tests/test_fallback_residual_audit.py::test_fallback_residual_audit_classifies_remaining_agent_work -q
  • PYTHONPATH=src python3 scripts/real_source_context_recovery_eval.py --repo-root . --case-source-manifest docs/benchmarks/learning/combined_case_source_manifest_v1.json --artifact-search-root /Users/yhryzy/dev/vulca/.scratch/p2b-live-shipgate --image-search-root /Users/yhryzy/dev/vulca/.scratch/p2b-live-shipgate --source-dependency-manifest docs/benchmarks/learning/real_source_dependency_label_manifest_v1.json --output-dir build/tiny_confidence_calibration_v1/real_recovery_eval --report build/tiny_confidence_calibration_v1/real_recovery_eval/report.json --max-recovered-source-context-gaps 0 --min-fallback-agent-reduction 2 --min-recovered-eval-cases 2
  • PYTHONPATH=src python3 scripts/training_effectiveness_report.py --repo-root . --output-dir build/tiny_confidence_calibration_v1/training_effectiveness --report build/tiny_confidence_calibration_v1/training_effectiveness/report.json
  • PYTHONPATH=src python3 scripts/fallback_residual_audit.py --decision-path build/tiny_confidence_calibration_v1/real_recovery_eval/recovered/dry_run/dry_run_decisions.jsonl --report build/tiny_confidence_calibration_v1/residual_audit/report.json
  • PYTHONPATH=src python3 -m pytest tests/test_dry_run_decision_router.py tests/test_training_effectiveness_report.py tests/test_real_source_context_recovery_eval.py tests/test_fallback_residual_audit.py tests/test_source_context_gap_pack.py -q
  • ruff check src/vulca/learning/dry_run_decision_router.py tests/test_dry_run_decision_router.py tests/test_training_effectiveness_report.py tests/test_fallback_residual_audit.py
  • python3 -m py_compile src/vulca/learning/dry_run_decision_router.py
  • git diff --check

@yha9806 yha9806 marked this pull request as ready for review May 7, 2026 21:24
@yha9806 yha9806 merged commit 4b20e64 into master May 7, 2026
2 checks passed
@yha9806 yha9806 deleted the codex/tiny-confidence-calibration-v1 branch May 7, 2026 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant