Caption-Based Image Retrieval Model #5019

dirkgr · 2021-02-24T23:05:18Z

We want to implement the Caption-Based Image Retrieval task from https://api.semanticscholar.org/CorpusID:199453025.

The COCO and Flickr30k datasets contain a large number of images with image captions. The task here is to train a model to pick the right image given the caption. The image must be picked from four images, one of which is the real one, and the other three are other random images from the dataset.

You will have to write Steps that produce a DatasetDict for Flickr30k and COCO, including code that can produce the negative examples. Each instance will consist of a caption with four images. You will also need to write model that can solve this task. The underlying component for the model will be VilBERT, and the VQA model is probably a good place to steal some code getting started.

The text was updated successfully, but these errors were encountered:

dirkgr added Contributions welcome Models Issues related to the allennlp-models repo GSoC hard Difficult tasks labels Feb 24, 2021

dirkgr added this to Not Started in Google Summer of Code Feb 26, 2021

dirkgr removed the GSoC label Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caption-Based Image Retrieval Model #5019

Caption-Based Image Retrieval Model #5019

dirkgr commented Feb 24, 2021 •

edited

Loading

Caption-Based Image Retrieval Model #5019

Caption-Based Image Retrieval Model #5019

Comments

dirkgr commented Feb 24, 2021 • edited Loading

dirkgr commented Feb 24, 2021 •

edited

Loading