Source code for Show, Attend and Tell: Neural Image Caption Generation with Visual Attention runnable on GPU and CPU.
Joint collaboration between the Université de Montréal & University of Toronto.
This code is written in python. To use it you will need:
- Python 2.7
- A relatively recent version of NumPy
- scikit learn
- skimage
- argparse
In addition, this code is built using the powerful Theano library. If you encounter problems specific to Theano, please use a commit from around February 2015 and notify the authors.
To use the evaluation script (metrics.py): see coco-caption for the requirements.
If you use this code as part of any published research, please acknowledge the following paper (it encourages researchers who publish their code!):
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention."
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan
Salakhutdinov, Richard Zemel, Yoshua Bengio. To appear ICML (2015)
@article{Xu2015show,
title={Show, Attend and Tell: Neural Image Caption Generation with Visual Attention},
author={Xu, Kelvin and Ba, Jimmy and Kiros, Ryan and Cho, Kyunghyun and Courville, Aaron and Salakhutdinov, Ruslan and Zemel, Richard and Bengio, Yoshua},
journal={arXiv preprint arXiv:1502.03044},
year={2015}
}
The code is released under a revised (3-clause) BSD License.