Hi, thanks for releasing LocAgent — great work!
A few quick questions:
-
Could you clarify how to use the fields in Loc-Bench_V1 when developing or evaluating new agents? A short description of each field and the label format would help a lot.
-
For evaluation, is there a canonical JSONL format for the results? The evaluation/run_evaluation.ipynb notebook is helpful, but a concrete example or format spec would be great — especially for batch evaluation runs.
-
Any plans for a leaderboard or a standard evaluation protocol for Loc-Bench?
Just want to make sure we're aligned with how the benchmark is intended to be used. Appreciate it!