Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Train a region detector on the features from Visual Genome #5003

Open
dirkgr opened this issue Feb 19, 2021 · 0 comments
Open

Train a region detector on the features from Visual Genome #5003

dirkgr opened this issue Feb 19, 2021 · 0 comments
Labels
Contributions welcome hard Difficult tasks Models Issues related to the allennlp-models repo

Comments

@dirkgr
Copy link
Member

dirkgr commented Feb 19, 2021

This is a project in computer vision, rather than natural language processing. It is here because we have found this RegionEmbedder to be important for down-stream tasks that combine vision and language features.

In AllenNLP, RegionDetectors take an image and predict "regions of interest". Each region is represented by some coordinates and a vector expressing the contents of the region.

Visual Genome is a dataset containing millions of such regions. This task about training a new region detector on the Visual Genome dataset.

Most of the meat of the model will not be implemented from scratch. Rather, we will use the components that torchvision gives us. Most of the work will be in writing a dataset reader that can read the visual genome features, and writing a model that is basically an adapter between the AllenNLP formats and the torchvision formats.

This project has many moving parts, and will likely be a bit on the difficult side.

@dirkgr dirkgr added Contributions welcome Models Issues related to the allennlp-models repo GSoC hard Difficult tasks labels Feb 19, 2021
@dirkgr dirkgr removed the GSoC label Mar 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Contributions welcome hard Difficult tasks Models Issues related to the allennlp-models repo
Projects
None yet
Development

No branches or pull requests

1 participant