Conversation
📊 Progress Report✅ Schema ValidationThis report measures progress towards the 3D array goal (benchmarks × models × metrics) as described in #2. |
Pricing from metadata.json: Input (cache miss): $0.3/M tokens Input (cache hit): $0.06/M tokens Output: $1.2/M tokens cache_write_price: ignored New cost_per_instance: $0.1731 Co-authored-by: OpenHands Bot <openhands@all-hands.dev>
Conversation Error ReportArchive: https://results.eval.all-hands.dev/swebench/litellm_proxy-minimax-MiniMax-M2-7/23463806447/results.tar.gz SummaryTotal conversations: 499 Error Occurrences (sorted by count)Count % Error Unique error types: 7 Orchestrator / Runtime Failures (from instance logs)Instances that required retries or failed at the harness level. Instance Retries Status Error sphinx-doc__sphinx-9320 3 has-conv runtime_failure_count=3 Instances with runtime failures: 4
|
Evaluation Results
Model:
MiniMax-M2.7Benchmark:
swe-benchAgent Version:
v1.14.0Results
Report Summary
Additional Metadata
49920This PR was automatically created by the evaluation pipeline.