Train a region detector on the features from Visual Genome #5003

dirkgr · 2021-02-19T03:01:09Z

This is a project in computer vision, rather than natural language processing. It is here because we have found this RegionEmbedder to be important for down-stream tasks that combine vision and language features.

In AllenNLP, RegionDetectors take an image and predict "regions of interest". Each region is represented by some coordinates and a vector expressing the contents of the region.

Visual Genome is a dataset containing millions of such regions. This task about training a new region detector on the Visual Genome dataset.

Most of the meat of the model will not be implemented from scratch. Rather, we will use the components that torchvision gives us. Most of the work will be in writing a dataset reader that can read the visual genome features, and writing a model that is basically an adapter between the AllenNLP formats and the torchvision formats.

This project has many moving parts, and will likely be a bit on the difficult side.

The text was updated successfully, but these errors were encountered:

dirkgr added Contributions welcome Models Issues related to the allennlp-models repo GSoC hard Difficult tasks labels Feb 19, 2021

dirkgr mentioned this issue Feb 24, 2021

Visual Genome Question Answering #5018

Open

dirkgr removed the GSoC label Mar 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train a region detector on the features from Visual Genome #5003

Train a region detector on the features from Visual Genome #5003

dirkgr commented Feb 19, 2021

Train a region detector on the features from Visual Genome #5003

Train a region detector on the features from Visual Genome #5003

Comments

dirkgr commented Feb 19, 2021