Skip to content

fix(evals): update tau2 airline task data from tau3#2377

Open
Maahir Sachdev (maahir30) wants to merge 2 commits intomainfrom
fix-tau-airline
Open

fix(evals): update tau2 airline task data from tau3#2377
Maahir Sachdev (maahir30) wants to merge 2 commits intomainfrom
fix-tau-airline

Conversation

@maahir30
Copy link
Copy Markdown
Contributor

Summary

  • Updates vendored airline tasks.json to tau3, which fixes bugs in the original tau2 task definitions
  • Updates source attribution in docstrings and LICENSE

Why

The τ³-bench release notes document 27 airline task fixes — incorrect expected actions, wrong argument values, ambiguous user instructions, and impossible constraints. All 15 of our evaluated tasks were affected. This PR updates tasks.json to the corrected upstream version so the eval grades against correct ground truth.

No code changes were needed. db.json and policy.md are unchanged between τ² and τ³, and the evaluation code is compatible with the new task data.

Source

sierra-research/tau2-bench dev/tau3 branch, specifically the CHANGELOG "Airline task fixes" section.

@github-actions github-actions bot added evals fix A bug fix (PATCH) internal User is a member of the `langchain-ai` GitHub organization size: M 200-499 LOC labels Apr 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

evals fix A bug fix (PATCH) internal User is a member of the `langchain-ai` GitHub organization size: M 200-499 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants