Source code for Show, Attend and Tell: Neural Image Caption Generation with Visual Attention runnable on CPU and GPU. Joint collaboration between Université de Montréal & University of Toronto.
This code is written in python, to use it you will need:
- Python 2.7
- A relative recent version of NumPy
- scikit learn
- argparse
In addition, this code is built using Theano. If you encounter problems with specific to Theano, please use a commit from around February 2015 and notify the authors.
To use the evaluation script: see coco-caption for the requirements.
Code is released under the revised (3-clause) BSD License. If you use this code as part of any published research, please acknowledge the following paper (It does encourages researchers who publish their code!):
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention."
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan
Salakhutdinov, Richard Zemel, Yoshua Bengio. To appear ICML (2015)
@article{Xu2015show,
title={Show, Attend and Tell: Neural Image Caption Generation with Visual Attention},
author={Xu, Kelvin and Ba, Jimmy and Kiros, Ryan and Courville, Aaron and Salakhutdinov, Ruslan and Zemel, Richard and Bengio, Yoshua},
journal={arXiv preprint arXiv:1502.03044},
year={2015}
}
- Install the above dependencies and
$ git clone
the repo - Install Theano using your favourite method
- TODO, rest of this