Significant discrepancy in CHAIR metrics when reproducing results with MiniGPT-4 (Vicuna-7B)

Hello, and thank you for your great work on this project!

I am currently trying to reproduce the CHAIR evaluation metrics using MiniGPT-4 with the Vicuna-7B backbone. However, I am observing a significant gap between the reported metrics and the results I obtained from my local setup.

**Expected metrics:**
- CHAIRs = 28.6
- CHAIRi = 8.5
- F1 = 71.5

**My reproduced metrics:**
- CHAIRs = 36.0
- CHAIRi = 14.5
- F1 = 62.2

**Our Environment:**
- GPU: NVIDIA RTX 5090
- Model: MiniGPT-4 (Vicuna-7B)
- OS: Linux (Ubuntu)
- CUDA version: 13.0

**Steps I followed:**
1. Set up the MiniGPT-4 model locally following the official instructions.
2. Ran the CHAIR evaluation pipeline.
3. Compared the resulting metrics with the Expected values.

The difference is quite substantial — CHAIRs increased by ~7.4 points, CHAIRi nearly doubled, and F1 dropped by ~9.3 points. I would greatly appreciate any guidance on what might be causing this discrepancy. Specifically, I am wondering:

1. Is there a particular model checkpoint or configuration file that should be used?
2. Could there be any known compatibility issues with newer GPU architectures (e.g., RTX 5090 / Blackwell) that might affect inference behavior?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant discrepancy in CHAIR metrics when reproducing results with MiniGPT-4 (Vicuna-7B) #561

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significant discrepancy in CHAIR metrics when reproducing results with MiniGPT-4 (Vicuna-7B) #561

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions