Skip to content

Latest commit

 

History

History
42 lines (29 loc) · 3.36 KB

nlvr2.md

File metadata and controls

42 lines (29 loc) · 3.36 KB

From https://arxiv.org/pdf/1505.00468.pdf.

Natural Language for Visual Reasoning for Real (NLVR2)

Description

(from https://lil.nlp.cornell.edu/nlvr/)

NLVR2 contains 107,292 examples of human-written English sentences grounded in pairs of photographs. NLVR2 retains the linguistic diversity of NLVR, while including much more visually complex images.

We only publicly release the sentence annotations and original image URLs, and scripts that download the images from the URLs. If you would like direct access to the images, please fill out this Google Form. This form asks for your basic information and asks you to agree to our Terms of Service.

Task

(from https://lil.nlp.cornell.edu/nlvr/) The Natural Language for Visual Reasoning (NLVR) task is to determine whether a sentence is true about a visual input. The data was collected through crowdsourcings, and solving the task requires reasoning about sets of objects, comparisons, and spatial relations. This includes two corpora: NLVR, with synthetically generated images, and NLVR2, which includes natural photographs.

Metrics

Accuracy.

Leaderboard

(Ranked by accurarcy on dev.)

Rank Model dev test Resources
1 VLMo 88.6 89.5 paper
2 CoCa 86.1 87.0 paper
3 SimVLM 84.5 85.2 paper
4 X-VLM 84.4 84.8 paper, code
5 VinVL 82.7 84.0 paper, code
6 ALBEF 82.6 83.1 paper, code, blog
7 BLIP 82.2 82.2 paper, code, demo, blog
8 OSCAR 78.1 78.4 paper, code
9 SOHO 76.4 77.3 paper, code
10 UNITER 77.2 77.9 paper, code

Downloading

Auto-downloading is not supported for this dataset. Please refer to https://lil.nlp.cornell.edu/nlvr/ and fill in the Google form to download the original images.

References

Suhr, Alane, Stephanie Zhou, Ally Zhang, Iris Zhang, Huajun Bai, and Yoav Artzi. "A corpus for reasoning about natural language grounded in photographs." arXiv preprint arXiv:1811.00491 (2018).