This is the code for PNP-VQA paper. We integrate the implementation into LAVIS.
We include an interactive demo Colab notebook to show PNP-VQA inference workflow:
- Image-question matching: compute the relevancy score of the image patches wrt the question.
- Image captioning: generate question-guided captions based on the relevancy score.
- Question answering: answer the question by using the captions.
Model | VQAv2 val | VQAv2 test | OK-VQA test | GQA test-dev | ||||
---|---|---|---|---|---|---|---|---|
Paper | LAVIS | Paper | LAVIS | Paper | LAVIS | Paper | LAVIS | |
PNP-VQAbase | 54.3 | 54.2 | 55.2 | 55.3 | 23.0 | 23.3 | 34.6 | 34.9 |
PNP-VQAlarge | 57.5 | 57.5 | 58.8 | 58.9 | 27.1 | 27.1 | 38.4 | 38.4 |
PNP-VQA3B | 62.1 | 62.1 | 63.5 | 63.5 | 34.1 | 34.0 | 42.3 | 42.3 |
To reproduce these evaluation results of PNP-VQA with different sizes, following steps below:
cd LAVIS
bash run_scripts/pnp-vqa/eval/eval_vqav2.sh ## 54.2
bash run_scripts/pnp-vqa/eval/eval_vqav2_large.sh ## 57.5
bash run_scripts/pnp-vqa/eval/eval_vqav2_3b.sh ## 62.1
bash run_scripts/pnp-vqa/eval/eval_vqav2_test.sh ## 55.3
bash run_scripts/pnp-vqa/eval/eval_vqav2_test_large.sh ## 58.9
bash run_scripts/pnp-vqa/eval/eval_vqav2_test_3b.sh ## 63.5
bash run_scripts/pnp-vqa/eval/eval_okvqa.sh ## 23.3
bash run_scripts/pnp-vqa/eval/eval_okvqa_large.sh ## 27.1
bash run_scripts/pnp-vqa/eval/eval_okvqa_3b.sh ## 34.0
bash run_scripts/pnp-vqa/eval/eval_gqa.sh ## 34.9
bash run_scripts/pnp-vqa/eval/eval_gqa_large.sh ## 38.4
bash run_scripts/pnp-vqa/eval/eval_gqa_3b.sh ## 42.3
If you find this code to be useful for your research, please consider citing.
@article{tiong2022plug,
title={Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training},
author={Tiong, Anthony Meng Huat and Li, Junnan and Li, Boyang and Savarese, Silvio and Hoi, Steven CH},
journal={arXiv preprint arXiv:2210.08773},
year={2022}
}