imageinwords/datasets at main · google/imageinwords

History

Name		Name	Last commit message	Last commit date
parent directory ..
CM_3600		CM_3600
DCI_Test		DCI_Test
DOCCI_Test		DOCCI_Test
IIW-400		IIW-400
LocNar_Eval		LocNar_Eval
README.md		README.md

README.md

IIW-Benchmark: Eval Datasets

We release a subset of human- and model-annotated IIW image & descriptions, as well as human SxS results on Human Authored and Model-Human sourced pairs of descriptions. The model generated descriptions may have hallucinations, information recall losses, or non-human like writing style artifacts. By releasing this subset along with human SxS judgements, we encourage the development of new metrics and evaluation systems to detect them in an automated, scalable manner. It also promotes fair comparison across methods in future work. The set is released under a CC-BY-4.0 license.

Human Annotated

We provide human-authored annotations from the IIW framework.

IIW-400, a new eval dataset of 400 random images sampled from DOCCI-AAR. Full IIW Task-1 and Task-2 annotations along with 100 human SxS results each on GPT-4V and IIW PaLI-5B models (including model predictions).
DCI-Test, 112 images re-annotated with the IIW framework along with human SxS results comparing them to DCI’s original human annotations.
DOCCI-Test, 100 random images re-annotated with the IIW framework along with human SxS results comparing them to DOCCI’s human annotations.

Model Annotated

We release 2.4k random images annotated by the IIW PaLI-5B model (Tab. 4) comprising of the IIW-400 set (as used for the human SxS evaluation), 1k samples from the LocNar Eval set (from the OpenImages subset), and 1k samples from the Crossmodal-3600 images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

datasets

datasets

CM_3600

CM_3600

DCI_Test

DCI_Test

DOCCI_Test

DOCCI_Test

IIW-400

IIW-400

LocNar_Eval

LocNar_Eval

README.md

README.md

README.md

IIW-Benchmark: Eval Datasets

Files

datasets

Directory actions

More options

Directory actions

More options

Latest commit

History

datasets

Folders and files

parent directory

IIW-Benchmark: Eval Datasets