You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
The COCO and Flickr30k datasets contain a large number of images with image captions. The task here is to train a model to pick the right image given the caption. The image must be picked from four images, one of which is the real one, and the other three are other random images from the dataset.
You will have to write Steps that produce a DatasetDict for Flickr30k and COCO, including code that can produce the negative examples. Each instance will consist of a caption with four images. You will also need to write model that can solve this task. The underlying component for the model will be VilBERT, and the VQA model is probably a good place to steal some code getting started.
The text was updated successfully, but these errors were encountered:
We want to implement the Caption-Based Image Retrieval task from https://api.semanticscholar.org/CorpusID:199453025.
The COCO and Flickr30k datasets contain a large number of images with image captions. The task here is to train a model to pick the right image given the caption. The image must be picked from four images, one of which is the real one, and the other three are other random images from the dataset.
You will have to write
Step
s that produce aDatasetDict
for Flickr30k and COCO, including code that can produce the negative examples. Each instance will consist of a caption with four images. You will also need to write model that can solve this task. The underlying component for the model will be VilBERT, and the VQA model is probably a good place to steal some code getting started.The text was updated successfully, but these errors were encountered: