MetricMate is a full-stack application for generating evaluation criteria with the help of Large Language Models (LLMs).
It provides a Python backend for API handling and model interaction, and a JavaScript/React frontend for an interactive user interface.
MetricMate helps users define and refine evaluation criteria for model assessments by leveraging OpenAI’s language models.
It is designed for researchers, evaluators, and AI practitioners who want to:
- Automate criteria creation for experiments
- Standardize evaluation frameworks
- Quickly adapt metrics to different tasks
- Install dependencies
pip install -r requirements.txt- Configure API key Update backend/config.ini with your personal OpenAI API key:
api_key = YOUR_OPENAI_API_KEY- Run the backend
python main.py- Install dependencies
npm install- Start the frontend
npm startIf you use our tools/code for your work, please cite the following paper:
@inproceedings{gebreegziabher2025metricmate,
title={MetricMate: An Interactive Tool for Generating Evaluation Criteria for LLM-as-a-Judge Workflow},
author={Gebreegziabher, Simret Araya and Chiang, Charles and Wang, Zichu and Ashktorab, Zahra and Brachman, Michelle and Geyer, Werner and Li, Toby Jia-Jun and G{\'o}mez-Zar{\'a}, Diego},
booktitle={Proceedings of the 4th Annual Symposium on Human-Computer Interaction for Work},
pages={1--18},
year={2025}
}