PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.
It is built on top of the SCAN and Awesome_Matching.
We have released two versions of SGRAF: Branch main for python2.7; Branch python3.6 for python3.6.
If any problems, please contact me at [email protected]. ([email protected] is deprecated)
The framework of SGRAF:
The updated results (Better than the original paper)
| Dataset | Module | Sentence retrieval | Image retrieval | ||||
| R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | ||
| Flick30k | SAF | 75.6 | 92.7 | 96.9 | 56.5 | 82.0 | 88.4 |
| SGR | 76.6 | 93.7 | 96.6 | 56.1 | 80.9 | 87.0 | |
| SGRAF | 78.4 | 94.6 | 97.5 | 58.2 | 83.0 | 89.1 | |
| MSCOCO1k | SAF | 78.0 | 95.9 | 98.5 | 62.2 | 89.5 | 95.4 |
| SGR | 77.3 | 96.0 | 98.6 | 62.1 | 89.6 | 95.3 | |
| SGRAF | 79.2 | 96.5 | 98.6 | 63.5 | 90.2 | 95.8 | |
| MSCOCO5k | SAF | 55.5 | 83.8 | 91.8 | 40.1 | 69.7 | 80.4 |
| SGR | 57.3 | 83.2 | 90.6 | 40.5 | 69.6 | 80.3 | |
| SGRAF | 58.8 | 84.8 | 92.1 | 41.6 | 70.9 | 81.5 | |
We recommended the following dependencies for Branch main.
- Python 2.7
- PyTorch (>=0.4.1)
- NumPy (>=1.12.1)
- TensorBoard
- Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punktWe follow SCAN to obtain image features and vocabularies, which can be downloaded by using:
https://www.kaggle.com/datasets/kuanghueilee/scan-featuresAnother download link is available below:
https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqCThe pretrained models are only for Branch python3.6(python3.6), not for Branch main(python2.7).
Modify the model_path, data_path, vocab_path in the evaluation.py file. Then run evaluation.py:
python evaluation.pyNote that fold5=True is only for evaluation on mscoco1K (5 folders average) while fold5=False for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.
Modify the data_path, vocab_path, model_name, logger_name in the opts.py file. Then run train.py:
For MSCOCO:
(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAFFor Flickr30K:
(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAFIf SGRAF is useful for your research, please cite the following paper:
@inproceedings{Diao2021SGRAF,
title={Similarity reasoning and filtration for image-text matching},
author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
volume={35},
number={2},
pages={1218--1226},
year={2021}
}
