This is the official project website for the paper CELLO: Causal Evaluation of Large Vision-Language Models.
For images: Download from Visual Genome.
For question, answers, etc: CELLO Dataset
{
"img_id": 2362590,
"question": "If the beach were to suddenly erode, would the women still be able to ride the horses securely?",
"graph_type": "confounding",
"task_type": "sufficient_cause",
"graph": {
"nodes": [
[3749557, {
"obj_name": "beach",
"obj_synsets": ["beach.n.01"],
"boxes": [4, 194, 493, 174],
"position": "right_down"
}],
[2037998, {
"obj_name": "horse",
"obj_synsets": ["horse.n.01"],
"boxes": [
[101, 170, 107, 110]
],
"position": "left_down"
}],
[1946903, {
"obj_name": "women",
"obj_synsets": ["woman.n.01"],
"boxes": [
[290, 126, 70, 113]
],
"position": "right_upper"
}]
],
"edges": [
[3749557, 2037998, {
"relation": "ON"
}],
[3749557, 1946903, {
"relation": "ON"
}],
[2037998, 1946903, {
"relation": "riding"
}]
]
},
"objs": [3749557, 2037998, 1946903],
"options": ["Yes", "No"],
"answer_index": 1,
"data_id": 5
}
python evaluate_cello.py --model [MODEL_NAME]
Please cite our paper if this repository inspires your work.
@article{chen2024cello,
title={CELLO: Causal Evaluation of Large Vision-Language Models},
author={Chen, Meiqi and Peng, Bo and Zhang, Yan and Lu, Chaochao},
journal={arXiv preprint arXiv:2406.19131},
year={2024}
}