Hello, and thank you for your great work on this project!
I am currently trying to reproduce the CHAIR evaluation metrics using MiniGPT-4 with the Vicuna-7B backbone. However, I am observing a significant gap between the reported metrics and the results I obtained from my local setup.
Expected metrics:
- CHAIRs = 28.6
- CHAIRi = 8.5
- F1 = 71.5
My reproduced metrics:
- CHAIRs = 36.0
- CHAIRi = 14.5
- F1 = 62.2
Our Environment:
- GPU: NVIDIA RTX 5090
- Model: MiniGPT-4 (Vicuna-7B)
- OS: Linux (Ubuntu)
- CUDA version: 13.0
Steps I followed:
- Set up the MiniGPT-4 model locally following the official instructions.
- Ran the CHAIR evaluation pipeline.
- Compared the resulting metrics with the Expected values.
The difference is quite substantial — CHAIRs increased by ~7.4 points, CHAIRi nearly doubled, and F1 dropped by ~9.3 points. I would greatly appreciate any guidance on what might be causing this discrepancy. Specifically, I am wondering:
- Is there a particular model checkpoint or configuration file that should be used?
- Could there be any known compatibility issues with newer GPU architectures (e.g., RTX 5090 / Blackwell) that might affect inference behavior?
Hello, and thank you for your great work on this project!
I am currently trying to reproduce the CHAIR evaluation metrics using MiniGPT-4 with the Vicuna-7B backbone. However, I am observing a significant gap between the reported metrics and the results I obtained from my local setup.
Expected metrics:
My reproduced metrics:
Our Environment:
Steps I followed:
The difference is quite substantial — CHAIRs increased by ~7.4 points, CHAIRi nearly doubled, and F1 dropped by ~9.3 points. I would greatly appreciate any guidance on what might be causing this discrepancy. Specifically, I am wondering: