Add gaia results for MiniMax-M2.7 by all-hands-bot · Pull Request #723 · OpenHands/openhands-index-results

all-hands-bot · 2026-03-25T21:41:44Z

Evaluation Results

Model: MiniMax-M2.7
Benchmark: gaia
Agent Version: v1.14.0

Results

Accuracy: 0.0%
Total Cost: $0.00
Average Instance Cost: $0.00
Total Duration: 0s (0.0m)
Average Instance Runtime: 0s

Report Summary

Total instances: 165
Submitted instances: 165
Resolved instances: 0
Unresolved instances: 0
Empty patch instances: 0
Error instances: 165

Additional Metadata

completed_instances: 0
incomplete_instances: 165

This PR was automatically created by the evaluation pipeline.

github-actions · 2026-03-25T21:41:59Z

📊 Progress Report

============================================================
OpenHands Index Results - Progress Report
============================================================

Target: Complete all model × benchmark pairs
  22 models × 5 benchmarks = 110 pairs
  (each pair requires all 3 metrics: score, cost_per_instance, average_runtime)

Incomplete Pairs (10):
  GPT-5.4:
    - swt-bench (all metrics)
  Qwen3.5-Flash:
    - swe-bench (all metrics)
    - swe-bench-multimodal (all metrics)
    - swt-bench (all metrics)
    - gaia (all metrics)
  Qwen3-Coder-Next:
    - swt-bench (all metrics)
  Minimax-2.7:
    - swe-bench (all metrics)
    - swe-bench-multimodal (all metrics)
    - swt-bench (all metrics)
    - commit0 (all metrics)

============================================================
OVERALL PROGRESS: ⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬜ 90.91%
  Complete: 100 / 110 pairs
============================================================

❌ Schema Validation

============================================================
Schema Validation Report
============================================================

Results directory: /home/runner/work/openhands-index-results/openhands-index-results/results
Files validated: 46
  Passed: 45
  Failed: 1

Errors:
  - /home/runner/work/openhands-index-results/openhands-index-results/results/MiniMax-M2.7/scores.json: Entry 0:
  • Field 'cost_per_instance': Input should be greater than 0 (got: 0.0)
  • Field 'average_runtime': Input should be greater than 0 (got: 0.0)

============================================================
VALIDATION FAILED
============================================================

This report measures progress towards the 3D array goal (benchmarks × models × metrics) as described in #2.

Add gaia results for MiniMax-M2.7

e99e070

all-hands-bot requested a review from juanmichelini March 25, 2026 21:41

Update metadata.json

952c42c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gaia results for MiniMax-M2.7#723

Add gaia results for MiniMax-M2.7#723
all-hands-bot wants to merge 2 commits intomainfrom
eval/MiniMax-M2.7/gaia-20260325-214141

all-hands-bot commented Mar 25, 2026

Uh oh!

github-actions bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

all-hands-bot commented Mar 25, 2026

Evaluation Results

Results

Report Summary

Additional Metadata

Uh oh!

github-actions bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📊 Progress Report

❌ Schema Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions bot commented Mar 25, 2026 •

edited

Loading