-
Notifications
You must be signed in to change notification settings - Fork 57
Pull requests: mlfoundations/evalchemy
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix LiveCodeBench crashes with type safety and list conversion
#137
opened Jul 1, 2025 by
dkimds
Loading…
Add debug mode support to 6 benchmarks (AIME24, AIME25, AIW, AMC23, HMMT, MATH500)
#135
opened Jul 1, 2025 by
dkimds
Loading…
Optimize Evaluation Workflow for Better Batching and Model Reuse For benchmarks with n_repeat > 1
#125
opened May 27, 2025 by
ihebchaa
Loading…
Fix: truncate model identifier in case model name is too long
#120
opened May 2, 2025 by
younesbelkada
Loading…
Support for Big Bench Extra Hard (General-purpose reasoning eval)
#92
opened Mar 8, 2025 by
Hritikbansal
Loading…
ProTip!
Follow long discussions with comments:>50.