Several Recent Approaches (2018) on VQA v2

The project is based on Cyanogenoid/vqa-counting. Most of the current VQA2.0 projects are based on https://github.com/hengyuan-hu/bottom-up-attention-vqa, while I personally prefer the Cyanogenoid's framework, because it's very clean and clear. So I reimplement several recent approaches including :

One of the benefit of our framework is that you can easily add counting module into your own model, which is proved to be effictive in imporving counting questions without harm the performance of your own model.

Dependencies

Python 3.6
- torch > 0.4
- torchvision 0.2
- h5py 2.7
- tqdm 4.19

Prepare dataset (FollowCyanogenoid/vqa-counting)

In the data directory, execute ./download.sh to download VQA v2.
- For experimenting, using 36 fixed proposals is faster, at the expense of a bit of accuracy. Uncomment the relevant lines in download.sh and change the paths in config.py accordingly. Don't forget to set output_size in there to 36 to actually get the speed-up.
Prepare the data by running

python preprocess-images.py
python preprocess-vocab.py

This creates an h5py database (95 GiB) containing the object proposal features and a vocabulary for questions and answers at the locations specified in config.py.

How to Train

All the models are named as XXX_model.py, and most of the parameters is under config.py. To change the model, simply change model_type in config.py. Then train your model with:

python train.py [optional-name]

To evaluate accuracy (VQA accuracy and balanced pair accuracy) in various categories, you can run

python eval-acc.py <path to .pth log> [<more paths to .pth logs> ...]

Currently the framework is only support test on validation set, to train with the full train&val splite and get the test-std/test-val results, I will support it later.

Model Details

Note that I didn't implement tfidf embedding of BAN model (though the current model has competitive/almost the same performance even without tfidf), only Glove Embedding is provided. About Intra- and Inter-modality Attention, Although I implemented all the details provided by the paper, it still seems not as good as the paper reported, even after I discussed with auther and made some modifications.

To Train Counting Model

Set following parameters in config.py:

model_type = 'counting'

To Train Bottom-up Top-down

model_type = 'baseline'

To Train Bilinear attention network

model_type = 'ban'

Note that BAN is very Memory Comsuming, so please ensure you got enough GPUs and run main.py with CUDA_VISIBLE_DEVICES=0,1,2,3

To Train Intra- and Inter-modality Attention

model_type = 'inter_intra'

You may need to change the learning rate decay strategy as well from gradual_warmup_steps and lr_decay_epochs in config.py

To Train Learning Conditioned Graph Structures

model_type = 'graph'

Though this method seem less competitive.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
data		data
logs		logs
.gitignore		.gitignore
DFAF.png		DFAF.png
LICENSE		LICENSE
README.md		README.md
ban_model.py		ban_model.py
baseline_model.py		baseline_model.py
config.py		config.py
counting.py		counting.py
counting_model.py		counting_model.py
data.py		data.py
eval-acc.py		eval-acc.py
graph_model.py		graph_model.py
inter_intra_model.py		inter_intra_model.py
preprocess-features.py		preprocess-features.py
preprocess-vocab.py		preprocess-vocab.py
reuse_modules.py		reuse_modules.py
train.py		train.py
utils.py		utils.py
view-log.py		view-log.py
word_embedding.py		word_embedding.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Several Recent Approaches (2018) on VQA v2

Dependencies

Prepare dataset (FollowCyanogenoid/vqa-counting)

How to Train

Model Details

About

Releases

Packages

Languages

License

lgcming/VQA2.0-Recent-Approachs-2018.pytorch

Folders and files

Latest commit

History

Repository files navigation

Several Recent Approaches (2018) on VQA v2

Dependencies

Prepare dataset (FollowCyanogenoid/vqa-counting)

How to Train

Model Details

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages