SCARF (System for Comprehensive Assessment of RAG Frameworks) is a modular and flexible evaluation framework designed for systematic benchmarking of Retrieval Augmented Generation (RAG) applications. It provides an end-to-end, black-box evaluation methodology, enabling easy comparison across diverse RAG frameworks in real-world deployment scenarios.
- Holistic RAG Evaluation: Assess factual accuracy, contextual relevance, and response coherence.
- Modular & Flexible: Supports multiple deployment configurations and evaluation setups.
- Automated Benchmarking: Compare different RAG Frameworks.
- Detailed Performance Reports: Generate insights into RAG framework efficiency and effectiveness.
- Python 3.8+
- pip
- Docker (optional, for containerized deployment of RAG frameworks)
- Clone the repository and navigate to the project directory:
git clone https://github.com/your-repo/scarf.git && cd scarf - (Optional) Set up the RAG framework components locally for testing: You can find example Dockerfiles for each component in the corresponding subfolders.
- Navigate to the SCARF framework-test folder:
cd frameworks-test/eus/ - Install dependencies:
cd pip install -r requirements.txt
- Configure SCARF for your needs through
config.json - Start SCARF
python test_rag_frameworks.py
Contributions are welcome! Please submit issues or pull requests.
For questions or support, reach out via GitHub Issues.
[m.rengo], [s.beadini], [d.alfano], [r.abbruzzese] @ Eustema SpA, Italy
If you use SCARF in your research or applications, please cite our technical report:
@techreport{SCARF,
title={A System for Comprehensive Assessment of RAG Frameworks},
author={Mattia Rengo and Senad Beadini and Domenico Alfano and Roberto Abbruzzese},
institution = {Eustema SpA},
month = {4},
year = {2025},
eprint = {2504.07803},
archivePrefix = {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2504.07803},
doi = {10.48550/arXiv.2504.07803},
note = {Technical Report}
}