This is the official implementation of the following papers:
-
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
-
The All-Seeing Project V2: Towards General Relation Comprehension of the Open World
The name "All-Seeing" is derived from "The All-Seeing Eye", which means having complete knowledge, awareness, or insight into all aspects of existence. The logo is Millennium Puzzle, an artifact from the manga "Yu-Gi-Oh!")
July 01, 2024
: All-Seeing Project v2 is accepted by ECCV 2024! Note that the model and data have already been released in huggingface.Feb 28, 2024
: All-Seeing Project v2 is out! Our ASMv2 achieves state-of-the-art performance across a variety of image-level and region-level tasks! See here for more details.Feb 21, 2024
: ASM, AS-Core, AS-10M, AS-100M is released!Jan 16, 2024
: All-Seeing Project is accepted by ICLR 2024!Aug 29, 2023
: All-Seeing Model Demo is available on the OpenXLab now!
- Release the ASMv2 model.
- Release the AS-V2 dataset.
- Release the ASM model.
- Release the full version of AS-1B.
- Release AS-Core, which is the human-verified subset of AS-1B.
- Release AS-100M, which is the 100M subset of AS-1B.
- Release AS-10M, which is the 10M subset of AS-1B.
- Online demo, including dataset browser and ASM online demo.
All-Seeing 1B (AS-1B) dataset: we propose a new large-scale dataset (AS-1B) for open-world panoptic visual recognition and understanding, using an economical semi-automatic data engine that combines the power of off-the-shelf vision/language models and human feedback.
All-Seeing Model (ASM): we develop a unified vision-language foundation model (ASM) for open-world panoptic visual recognition and understanding. Aligning with LLMs, our ASM supports versatile image-text retrieval and generation tasks, demonstrating impressive zero-shot capability.
All-Seeing Dataset V2 (AS-V2) dataset: we propose a novel task, termed Relation Conversation (ReC), which unifies the formulation of text generation, object localization, and relation comprehension. Based on the unified formulation, we construct the AS-V2 dataset, which consists of 127K high-quality relation conversation samples, to unlock the ReC capability for Multi-modal Large Language Models (MLLMs).
All-Seeing Model v2 (ASMv2): we develop ASMv2, which integrates the Relation Conversation ability while maintaining powerful general capabilities. It is endowed with grounding and referring capabilities, exhibiting state-of-the-art performance on region-level tasks. Furthermore, this model can be naturally adapted to the Scene Graph Generation task in an open-ended manner.
Circular-based Relation Probing Evaluation (CRPE) benchmark: We construct a benchmark called Circular-based Relation Probing Evaluation (CRPE), which is the first benchmark that covers all elements of the relation triplets (subject, predicate, object)
, providing a systematic platform for the evaluation of relation comprehension ability.
This project is released under the Apache 2.0 license.
If you find this project useful in your research, please consider cite:
@article{wang2023allseeing,
title={The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World},
author={Wang, Weiyun and Shi, Min and Li, Qingyun and Wang, Wenhai and Huang, Zhenhang and Xing, Linjie and Chen, Zhe and Li, Hao and Zhu, Xizhou and Cao, Zhiguo and others},
journal={arXiv preprint arXiv:2308.01907},
year={2023}
}
@article{wang2024allseeing_v2,
title={The All-Seeing Project V2: Towards General Relation Comprehension of the Open World},
author={Wang, Weiyun and Ren, Yiming and Luo, Haowen and Li, Tiantong and Yan, Chenxiang and Chen, Zhe and Wang, Wenhai and Li, Qingyun and Lu, Lewei and Zhu, Xizhou and others},
journal={arXiv preprint arXiv:2402.19474},
year={2024}
}