Attention on Attention for Image Captioning

This repository includes the implementation for Attention on Attention for Image Captioning.

Requirements

Python 3.6
Java 1.8.0
PyTorch 1.0
cider (already been added as a submodule)
coco-caption (already been added as a submodule)
tensorboardX

Training AoANet

Prepare data

See details in data/README.md.

(notes: Set word_count_threshold in scripts/prepro_labels.py to 4 to generate a vocabulary of size 10,369.)

You should also preprocess the dataset and get the cache for calculating cider score for SCST:

$ python scripts/prepro_ngrams.py --input_json data/dataset_coco.json --dict_json data/cocotalk.json --output_pkl data/coco-train --split train

Start training

$ CUDA_VISIBLE_DEVICES=0 sh train.sh

See opts.py for the options. (You can download the pretrained models from here.)

Evaluation

$ CUDA_VISIBLE_DEVICES=0 python eval.py --model log/log_aoanet_rl/model.pth --infos_path log/log_aoanet_rl/infos_aoanet.pkl  --dump_images 0 --dump_json 1 --num_images -1 --language_eval 1 --beam_size 2 --batch_size 100 --split test

Performance

You will get the scores close to below after training under xe loss for 25 epochs:

{'Bleu_1': 0.7729384559899702, 'Bleu_2': 0.6163398035383025, 'Bleu_3': 0.4790123137715982, 'Bleu_4': 0.36944349063530374, 'METEOR': 0.2848188431924821, 'ROUGE_L': 0.5729849683867054, 'CIDEr': 1.1842173801790759, 'SPICE': 0.21650786258302354}

(notes: You can enlarge --max_epochs in train.sh to train the model for more epochs and improve the scores.)

after training under SCST loss for another 15 epochs, you will get:

{'Bleu_1': 0.8054903453672397, 'Bleu_2': 0.6523038976984842, 'Bleu_3': 0.5096621263772566, 'Bleu_4': 0.39140307771618477, 'METEOR': 0.29011216375635934, 'ROUGE_L': 0.5890369750273199, 'CIDEr': 1.2892294296245852, 'SPICE': 0.22680092759866174}

Reference

If you find this repo helpful, please consider citing:

@inproceedings{huang2019attention,
  title={Attention on Attention for Image Captioning},
  author={Huang, Lun and Wang, Wenmin and Chen, Jie and Wei, Xiao-Yong},
  booktitle={International Conference on Computer Vision},
  year={2019}
}

Acknowledgements

This repository is based on self-critical.pytorch, and you may refer to it for more details about the code.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
cider @ dbb3960		cider @ dbb3960
coco-caption @ dda03fc		coco-caption @ dda03fc
data		data
misc		misc
models		models
scripts		scripts
vis		vis
.gitmodules		.gitmodules
ADVANCED.md		ADVANCED.md
LICENSE		LICENSE
README.md		README.md
dataloader.py		dataloader.py
dataloaderraw.py		dataloaderraw.py
eval.py		eval.py
eval_ensemble.py		eval_ensemble.py
eval_utils.py		eval_utils.py
opts.py		opts.py
test-best.sh		test-best.sh
test-last.sh		test-last.sh
train-wo-refining.sh		train-wo-refining.sh
train.py		train.py
train.sh		train.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Attention on Attention for Image Captioning

Requirements

Training AoANet

Prepare data

Start training

Evaluation

Performance

Reference

Acknowledgements

About

Releases

Packages

Languages

License

husthuaan/AoANet

Folders and files

Latest commit

History

Repository files navigation

Attention on Attention for Image Captioning

Requirements

Training AoANet

Prepare data

Start training

Evaluation

Performance

Reference

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages