RetVQA

Official implementation of the Answer Mining from a Pool of Images: Towards Retrieval-Based Visual Question Answering (IJCAI 2023 paper)

paper | arxiv | project page

Requirements

Use python >= 3.8.5. Conda recommended : https://docs.anaconda.com/anaconda/install/linux/
Use pytorch 1.9.0; CUDA 11.1

To setup environment

conda env create -n retvqa --file retvqa.yml
conda activate retvqa

Data

Images: Visual Genome Krishna et al.

RetVQA: here

Feature extraction

Image Feature extraction: Inside feature_extraction/, run

python retqa_proposal.py

Refer to this repo for more detailed set-up of Faster R-CNN feature extractor.

Relevance encoder

Refer to COFAR. ITM only variant serves as relevance encoder, when pre-trained on COCO and finetuned on RetVQA.

MI-BART

Training

bash retvqa_VLBart.sh <no. of GPU>

Inference in oracle setting

bash retvqa_VLBart_test.sh <no. of GPU>

Inference in retrieved images setting

bash retvqa_retrieved_VLBart_test.sh <no. of GPU>

Evaluation

Download our retvqa finetuned checkpoint from here.

python eval_retvqa.py --gt_file <ground truth answers file path> --results_file <path to the generated answers>

License

This code and data are released under the MIT license.

Cite

If you find this data/code/paper useful for your research, please consider citing.

@inproceedings{retvqa,
  author       = {Abhirama Subramanyam Penamakuri and
                  Manish Gupta and
                  Mithun Das Gupta and
                  Anand Mishra},
  title        = {Answer Mining from a Pool of Images: Towards Retrieval-Based Visual
                  Question Answering},
  booktitle    = {IJCAI},
  publisher    = {ijcai.org},
  year         = {2023},
  url          = {https://doi.org/10.24963/ijcai.2023/146},
  doi          = {10.24963/ijcai.2023/146},
}

Acknowledgements

We used code-base and pre-trained models of VLBart.
Abhirama S. Penamakuri is supported by Prime Minister Research Fellowship (PMRF), Minsitry of Education, Government of India.
We thank Microsoft for supporting this work through the Microsoft Academic Partnership Grant (MAPG) 2021.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
feature_extraction		feature_extraction
mi_bart		mi_bart
LICENSE		LICENSE
READme.md		READme.md
retvqa.yml		retvqa.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RetVQA

Requirements

Data

Feature extraction

Relevance encoder

MI-BART

Training

Inference in oracle setting

Inference in retrieved images setting

Evaluation

License

Cite

Acknowledgements

About

Releases

Packages

Languages

License

Abhiram4572/mi_bart

Folders and files

Latest commit

History

Repository files navigation

RetVQA

Requirements

Data

Feature extraction

Relevance encoder

MI-BART

Training

Inference in oracle setting

Inference in retrieved images setting

Evaluation

License

Cite

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages