Skip to content

Significant discrepancy in CHAIR metrics when reproducing results with MiniGPT-4 (Vicuna-7B) #561

@Glesep

Description

@Glesep

Hello, and thank you for your great work on this project!

I am currently trying to reproduce the CHAIR evaluation metrics using MiniGPT-4 with the Vicuna-7B backbone. However, I am observing a significant gap between the reported metrics and the results I obtained from my local setup.

Expected metrics:

  • CHAIRs = 28.6
  • CHAIRi = 8.5
  • F1 = 71.5

My reproduced metrics:

  • CHAIRs = 36.0
  • CHAIRi = 14.5
  • F1 = 62.2

Our Environment:

  • GPU: NVIDIA RTX 5090
  • Model: MiniGPT-4 (Vicuna-7B)
  • OS: Linux (Ubuntu)
  • CUDA version: 13.0

Steps I followed:

  1. Set up the MiniGPT-4 model locally following the official instructions.
  2. Ran the CHAIR evaluation pipeline.
  3. Compared the resulting metrics with the Expected values.

The difference is quite substantial — CHAIRs increased by ~7.4 points, CHAIRi nearly doubled, and F1 dropped by ~9.3 points. I would greatly appreciate any guidance on what might be causing this discrepancy. Specifically, I am wondering:

  1. Is there a particular model checkpoint or configuration file that should be used?
  2. Could there be any known compatibility issues with newer GPU architectures (e.g., RTX 5090 / Blackwell) that might affect inference behavior?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions