Skip to content

feature request: Better option for obtaining Agent Evaluation results in a JSON file #1036

Open
@Neutrollized

Description

@Neutrollized

** Please make sure you read the contribution guide and file the issues in the right place. **
Contribution guide.

Is your feature request related to a problem? Please describe.
I'm unable to obtain Agent Evaluation (adk eval) results in JSON format/file.

Describe the solution you'd like
An additional option to pass to adk eval such as: --results_output_file FILENAME.json so we can obtain a printed results in JSON format.

Describe alternatives you've considered

Alternative 1

The best option currently is to run adk eval with the --print_detailed_results option and redirect the output to a file, i.e.:

adk eval \
    --config_file_path eval/data/test_config.json \
    --print_detailed_results \
    my_agent \
    eval/data/my_test.evalset.json

However, this output will contain a summary from the test result that are not JSON and I would have to delete the first dozen lines or so in the output to get what I want.

Alternative 2

Another alternative (which is terrible in its current form) is to run the eval set from adk web, but there are a lot of issues with this method:

  1. Eval Set file needs ot be in the same directory as your agent module
  2. There's no way to pass in a test_config.json with criteria config, and so the tool trajectory is set at 1.0 and no option to specify a response matching score
  3. output goes to a folder .adk/eval_history/ in the agent module directory and the file is a .json file, except everything double-quote in there is escaped with a backslash (i.e. ") and it's not easy to read. I had to get another Python script to strip out the escape backslashes and return the json.loads(CLEANED_DATA) before it's usable.

Additional context
I wrote a Medium article about Agent Evaluation with ADK and this was a question that came up, so I thought I'd turn it into a GitHub issue 😆

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions