Image Captioning

Second lab of the Scalable Machine Learning course of the EIT Digital data science master at KTH

Introduction

This assignment aims to describe the content of an image by using CNNs and RNNs to build an Image Caption Generator. The model would be based on the paper [4] and it will be implemented using Tensorflow and Keras. The dataset used is Flickr 8K, consisting of 8,000 images each one paired with five different captions to provide clear descriptions. The implementation has been done using Python in a Jupyter notebook, where every step is carefully documented.

Architecture

The model architecture consists of a CNN which extracts the features and encodes the input image and a Recurrent Neural Network (RNN) based on Long Short Term Memory (LSTM) layers. The most significant difference with other models is that the image embedding is provided as the first input to the RNN network and only once.

Results

The following picture presents an example of a generated caption by the implemented model:

Further work

Despite the good results on the previous example, the Bilingual Evaluation Understudy (BLEU) Score for n-grams where n is greater than 2 is not very positive. As future work it would be interesting to study changes in the architecture from where to input the picture.

Authors

Serghei Socolovschi [email protected]
Angel Igareta [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning

Second lab of the Scalable Machine Learning course of the EIT Digital data science master at KTH

Introduction

Architecture

Results

Further work

Authors

About

Languages

License

angeligareta/image-captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning

Second lab of the Scalable Machine Learning course of the EIT Digital data science master at KTH

Introduction

Architecture

Results

Further work

Authors

About

Topics

Resources

License

Stars

Watchers

Forks

Languages