Skip to content

devansh-lodha/vlms-are-blind-analysis

Repository files navigation

vlms-are-blind-analysis

This repository evaluates certain opensource VLMs on the first task of the BlindTest benchmark as in the paper Vision language models are blind . The task is to count the number of intersections between two lines. The models are evaluated by zero-shot prompting on the test data provided by the benchmark.

Results
Image 1 Image 2
Image 3 Image 4

Instructions to set up the project locally

python3 -m venv env

Activating the environment on Windows:

env\Scripts\activate

Activating the environment on MacOS/Linux:

source env/bin/activate

Installing dependencies:

pip3 install -r requirements.txt

Data

The data is generated by ./line_intersection_data.ipynb. The data is stored in ./my2DlinePlots folder (not pushed to github). It's the exact data as used for the paper.

Evaluation

Evaluation code is setup in ./evaluation.ipynb:

The prompts are again the same as used in the paper.

  1. How many times do the blue and red lines touch each other? Answer with a number in curly brackets, e.g., {5}.
  2. Count the intersection points where the blue and red lines meet. Put your answer in curly brackets, e.g., {2}.

The get_model_predictions() function implements inference code for all models. Then a loop is setup that loads the model and processor and evaluates the model on the test data. The model's predictions are stored in a pandas dataframe. All of the code is taken and modified from respective model's documentation on HuggingFace.

The pandas dataframe will have the following columns for each of the 3600 rows:

  • filename
  • gt
  • linewidth
  • resolution
  • distances
  • image_path
  • prompt
  • Qwen/Qwen2-VL-7B-Instruct
  • meta-llama/Llama-3.2-11B-Vision-Instruct
  • llava-hf/llava-v1.6-mistral-7b-hf
  • OpenGVLab/InternVL2_5-8B-MPO
  • microsoft/Phi-3.5-vision-instruct

The model columns will store their predictions.

The 'instruct' variant models is chosen where available since they are optimized for VQA.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published