Visual and Embodied Concepts evaluation benchmark. EMNLP 2023 Main Conference.
For shape
, material
and color
task, there is an object and corresponding two options for the given property, e.g., chair is made of wood instead of jade
.
For mass
, temperature
, hardness
, size
and height
task, the pair is consists of two items and a relation label, e.g., red lego brick is more light-weight than a hammer
You can easily load the dataset from Huggingface Datasets API:
from datasets import load_dataset
data = {}
for task in ['color', 'size', 'shape', 'height', 'material', 'mass', 'temperature', 'hardness']:
data[task] = load_dataset("tobiaslee/VEC", task)
print(data['material']['test'][0])
# instance
# {'obj': 'chair', 'positive': 'wood', 'negative': 'jade', 'relation': 'material'}
# meaning: chair is made of
print(data['mass']['test'][0])
# {'obj1': 'red lego brick', 'obj2': 'hammer', 'relation': 'mass', 'label': 0}
# meaning: red logo brick is more light-weight than hammer.
# label=0 indicates`<` while label=1 indicates `>`
If you found this benchmark, please kindly cite our paper:
@article{li2023vec,
title={Can Language Models Understand Physical Concepts?},
author={Li, Lei and Xu, Jingjing and Dong, Qingxiu and Zheng, Ce and Liu, Qi and Kong, Lingpeng and Sun, Xu},
journal={arXiv preprint arXiv:2305.14057},
year={2023}
}