You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.
This is a project in computer vision, rather than natural language processing. It is here because we have found this RegionEmbedder to be important for down-stream tasks that combine vision and language features.
In AllenNLP, RegionDetectors take an image and predict "regions of interest". Each region is represented by some coordinates and a vector expressing the contents of the region.
Visual Genome is a dataset containing millions of such regions. This task about training a new region detector on the Visual Genome dataset.
Most of the meat of the model will not be implemented from scratch. Rather, we will use the components that torchvision gives us. Most of the work will be in writing a dataset reader that can read the visual genome features, and writing a model that is basically an adapter between the AllenNLP formats and the torchvision formats.
This project has many moving parts, and will likely be a bit on the difficult side.
The text was updated successfully, but these errors were encountered:
This is a project in computer vision, rather than natural language processing. It is here because we have found this
RegionEmbedder
to be important for down-stream tasks that combine vision and language features.In AllenNLP,
RegionDetector
s take an image and predict "regions of interest". Each region is represented by some coordinates and a vector expressing the contents of the region.Visual Genome is a dataset containing millions of such regions. This task about training a new region detector on the Visual Genome dataset.
Most of the meat of the model will not be implemented from scratch. Rather, we will use the components that
torchvision
gives us. Most of the work will be in writing a dataset reader that can read the visual genome features, and writing a model that is basically an adapter between the AllenNLP formats and thetorchvision
formats.This project has many moving parts, and will likely be a bit on the difficult side.
The text was updated successfully, but these errors were encountered: