Skip to content
This repository was archived by the owner on Dec 11, 2024. It is now read-only.
/ BOK-VQA Public archive

[AAAI2024] BOK-VQA : Bilingual Outside Knowledge-based Visual Question Answering via Graph Representation Pretraining

Notifications You must be signed in to change notification settings

mjkmain/BOK-VQA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

46 Commits
 
 
 
 
 
 
 
 

Repository files navigation

[AAAI2024]BOK-VQA

BOK-VQA : Bilingual Outside Knowledge-based Visual Question Answering via Graph Representation Pretraining

Paper Link : https://arxiv.org/abs/2401.06443

BOK-VQA Dataset

BOK-VQA dataset comprising 17,836 samples and 282,533 knowledge triples. Each sample contained of an image, question, answer, and $k$ external knowledge IDs that are necessary to solve a question.

we assembled 282,533 triple knowledge entries comprising 1,579 objects and 42 relations from English ConceptNet and DBpedia. The selection criteria for the objects and relations were principally based on the 500 objects and 10 relations used in the FVQA dataset. In addition, considering the usage frequency, we incorporated 1,079 objects derived from ImageNet and supplemented 32 additional relations.

  • Dataset Sample

BOK-VQA data sample

GEL-VQA Model architecture

(GEL-VQA : Graph-Embeded Learning-based Visual Question Answering.) In the context of VQA that uses external knowledge, it is unrealistic to assume that one possesses external knowledge pertaining to the given images and questions. Consequently, we proposed the GEL-VQA model that employs a multitask learning approach to perform triple prediction and uses the predicted triples as external knowledge.

GEL-VQA

Experiment results

results

Training & Test Code

1. Environmental settings

1.1 Clone this repository

git clone https://github.com/mjkmain/BOK-VQA.git
cd BOK-VQA

1.2 Install packages

python3 -m venv [env_name]
source [env_name]/bin/activate
pip install -e .

1.3 Download Images

You can find our dataset at AI-Hub

After the download is complete, place the image directory inside the data directory.

Your directory structure will then like following:

┗━━━ data
      ┣━━━ image
      ┃     ┗━━━ 121100220220707140119.jpg
      ┃     ┗━━━ 121100220220707140304.jpg
      ┃     ┗━━━ 121100520220830104341.jpg
      ┃     ┗━━━ ...
      ┗━━━ all_triple.csv
      ┗━━━ BOKVQA_data_en.csv
      ┗━━━ BOKVQA_data_ko.csv
      ┗━━━ BOKVQA_data_test_en.csv
      ┗━━━ BOKVQA_data_test_ko.csv

Also, you can find the preprocessed CSV data in the data directory.

  • all_triple.csv : The entire knowledge base consisting of 282,533 triples.
  • BOKVQA_data_en.csv: English BOKVQA data for training.
  • BOKVQA_data_test_en.csv: English BOKVQA data for testing.
  • BOKVQA_data_ko.csv: Korean BOKVQA data for training.
  • BOKVQA_data_test_ko.csv: Korean BOKVQA data for testing.

2. Train KGE

At KGE-train directory,

First, you need to train the KGE before training the VQA model.

python kge_convkb_train.py 

When the end of the training, you'll find the saved files in the kge_save directory.

You need to change the KGE_DIR and DATA_DIR path in the util_functions.py and IMAGE_DIR path in the vqa_datasets.py

3. Train the VQA model

At train directory,

  • To train the GEL-VQA model, use the following command:
python train_GEL-VQA.py --lang ['ko', 'en', 'bi'] 

arguments

  • --lang: Selects the language for training.
    • ko: Korean
    • en: English
    • bi: Bilingual (both of English and Korean)

Make sure to replace [ko, en, bi] with your choice of language. For example, if you wish to train on English data, your command would be:

python train_GEL-VQA_Ideal.py --lang en 

After training, you can find the saved VQA model file in the saved_model directory.

4. Test

At test directory,

  • To test the GEL-VQA model, use the following command:
python test_GEL-VQA.py --file_name [FILENAME] --lang ['ko', 'en', 'bi']

The file_name is organized as follows:

[model_name]_[lang]_[accuracy].pt

Contact

[email protected]

[email protected]

About

[AAAI2024] BOK-VQA : Bilingual Outside Knowledge-based Visual Question Answering via Graph Representation Pretraining

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages