Image Captioning

Project Overview

In this work we have to combine Deep Convolutional Nets for image classification with Recurrent Networks for sequence modeling, to create a single network that generates descriptions of image using COCO Dataset - Common Objects in Context.

COCO is a large image dataset designed for object detection, segmentation, person keypoints detection, stuff segmentation, and caption generation. GPU Accelerated Computing (CUDA) is neccessery for this project.

Instructions

Clone this repo: https://github.com/cocodataset/cocoapi

git clone https://github.com/cocodataset/cocoapi.git

Setup the coco API (also described in the readme here)

cd cocoapi/PythonAPI  
make  
cd ..

Download some specific data from here: https://cocodataset.org/#download (described below)

Under Annotations, download:
- 2017 Train/Val annotations [241MB] (extract captions_train2017.json and captions_val2017.json, and place at locations cocoapi/annotations/captions_train2017.json and cocoapi/annotations/captions_val2017.json, respectively)
- 2017 Testing Image info [1MB] (extract image_info_test2017.json and place at location cocoapi/annotations/image_info_test2017.json)
Under Images, download:
- 2017 Train images [118K/18GB] (extract the train2017 folder and place at location cocoapi/images/train2017/)
- 2017 Val images [5K/1GB] (extract the val2017 folder and place at location cocoapi/images/val2017/)
- 2017 Test images [41K/6GB] (extract the test2017 folder and place at location cocoapi/images/test2017/)

Project Structure

The project is structured as a series of Jupyter notebooks that are designed to be completed in sequential order:

Notebook 0 : Microsoft Common Objects in COntext (MS COCO) dataset;

Notebook 1 : Load and pre-process data from the COCO dataset;

Notebook 2 : Training the CNN-RNN Model;

Notebook 3 : Load trained model and generate predictions.

Installation

$ git clone https://github.com/nalbert9/Image-Captioning.git
$ pip3 install -r requirements.txt

Inference

Following are a few results obtained after training the model for 3 epochs.

Image	Caption
	Generated Caption: a person riding a surf board on a wave
	Generated Caption: a group of people riding motorcycles down a street
	Generated Caption: a young boy brushing his teeth with a toothbrush
	Generated Caption: a vase with a flower on a table

References

Microsoft COCO, arXiv:1411.4555v2 [cs.CV] 20 Apr 2015 and arXiv:1502.03044v3 [cs.LG] 19 Apr 2016

Licence

This project is licensed under the terms of the

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
cocoapi		cocoapi
images		images
models		models
.gitignore		.gitignore
0_Dataset.ipynb		0_Dataset.ipynb
1_Preliminaries.ipynb		1_Preliminaries.ipynb
2_Training.ipynb		2_Training.ipynb
3_Inference.ipynb		3_Inference.ipynb
LICENSE		LICENSE
README.md		README.md
data_loader.py		data_loader.py
model.py		model.py
requirements.txt		requirements.txt
training_log.txt		training_log.txt
vocab.pkl		vocab.pkl
vocabulary.py		vocabulary.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Image Captioning

Project Overview

Instructions

Project Structure

Installation

Inference

References

Licence

About

Releases

Packages

Languages

License

nalbert9/Image-Captioning

Folders and files

Latest commit

History

Repository files navigation

Image Captioning

Project Overview

Instructions

Project Structure

Installation

Inference

References

Licence

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages