BOK-VQA : Bilingual Outside Knowledge-based Visual Question Answering via Graph Representation Pretraining
Paper Link : https://arxiv.org/abs/2401.06443
BOK-VQA dataset comprising 17,836 samples and 282,533 knowledge triples. Each sample contained of an image, question, answer, and
we assembled 282,533 triple knowledge entries comprising 1,579 objects and 42 relations from English ConceptNet and DBpedia. The selection criteria for the objects and relations were principally based on the 500 objects and 10 relations used in the FVQA dataset. In addition, considering the usage frequency, we incorporated 1,079 objects derived from ImageNet and supplemented 32 additional relations.
- Dataset Sample
(GEL-VQA : Graph-Embeded Learning-based Visual Question Answering.) In the context of VQA that uses external knowledge, it is unrealistic to assume that one possesses external knowledge pertaining to the given images and questions. Consequently, we proposed the GEL-VQA model that employs a multitask learning approach to perform triple prediction and uses the predicted triples as external knowledge.
git clone https://github.com/mjkmain/BOK-VQA.git
cd BOK-VQApython3 -m venv [env_name]
source [env_name]/bin/activatepip install -e .You can find our dataset at AI-Hub
After the download is complete, place the image directory inside the data directory.
Your directory structure will then like following:
┗━━━ data
┣━━━ image
┃ ┗━━━ 121100220220707140119.jpg
┃ ┗━━━ 121100220220707140304.jpg
┃ ┗━━━ 121100520220830104341.jpg
┃ ┗━━━ ...
┗━━━ all_triple.csv
┗━━━ BOKVQA_data_en.csv
┗━━━ BOKVQA_data_ko.csv
┗━━━ BOKVQA_data_test_en.csv
┗━━━ BOKVQA_data_test_ko.csv
Also, you can find the preprocessed CSV data in the data directory.
- all_triple.csv : The entire knowledge base consisting of 282,533 triples.
- BOKVQA_data_en.csv: English BOKVQA data for training.
- BOKVQA_data_test_en.csv: English BOKVQA data for testing.
- BOKVQA_data_ko.csv: Korean BOKVQA data for training.
- BOKVQA_data_test_ko.csv: Korean BOKVQA data for testing.
At
KGE-traindirectory,
First, you need to train the KGE before training the VQA model.
python kge_convkb_train.py When the end of the training, you'll find the saved files in the kge_save directory.
You need to change the KGE_DIR and DATA_DIR path in the util_functions.py and IMAGE_DIR path in the vqa_datasets.py
At
traindirectory,
- To train the GEL-VQA model, use the following command:
python train_GEL-VQA.py --lang ['ko', 'en', 'bi'] --lang: Selects the language for training.ko: Koreanen: Englishbi: Bilingual (both of English and Korean)
Make sure to replace [ko, en, bi] with your choice of language. For example, if you wish to train on English data, your command would be:
python train_GEL-VQA_Ideal.py --lang en After training, you can find the saved VQA model file in the saved_model directory.
At
testdirectory,
- To test the GEL-VQA model, use the following command:
python test_GEL-VQA.py --file_name [FILENAME] --lang ['ko', 'en', 'bi']The file_name is organized as follows:
[model_name]_[lang]_[accuracy].pt


