Skip to content
Change the repository type filter

All

    Repositories list

    • Terminal-Bench-Science: Evaluating AI Agents on Complex Real-World Scientific Workflows in the Terminal
      Python
      Apache License 2.0
      73143249Updated Jun 17, 2026Jun 17, 2026
    • harbor

      Public
      Harbor is a framework for running agent evaluations and creating and using RL environments.
      Python
      Apache License 2.0
      1.2k2.5k138288Updated Jun 17, 2026Jun 17, 2026
    • Measuring agents' ability to get work done on a computer
      Python
      2892413112Updated Jun 16, 2026Jun 16, 2026
    • Python
      Apache License 2.0
      12600Updated Jun 16, 2026Jun 16, 2026
    • TypeScript
      14720Updated Jun 16, 2026Jun 16, 2026
    • Shell
      3800Updated Jun 16, 2026Jun 16, 2026
    • docs

      Public
      MDX
      MIT License
      0000Updated Jun 3, 2026Jun 3, 2026
    • benchmark-template

      Public template
      Harbor Benchmark Template
      Python
      101277Updated May 30, 2026May 30, 2026
    • A curated list of awesome Harbor ecosystem projects
      24201Updated May 29, 2026May 29, 2026
    • 11433620Updated May 16, 2026May 16, 2026
    • skills

      Public
      Public agent skills catalog for Harbor
      Apache License 2.0
      19011Updated May 12, 2026May 12, 2026
    • Terminal-Bench 2.1
      Shell
      Apache License 2.0
      62117Updated May 5, 2026May 5, 2026
    • Shell
      Apache License 2.0
      872881719Updated Apr 30, 2026Apr 30, 2026
    • Realistic examples of building evals and optimizing agents with Harbor
      Python
      Apache License 2.0
      1010401Updated Apr 23, 2026Apr 23, 2026
    • MDX
      11304Updated Mar 31, 2026Mar 31, 2026
    • A benchmark for LLMs on complicated tasks in the terminal
      Python
      Apache License 2.0
      5432.4k112190Updated Jan 22, 2026Jan 22, 2026
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.