Welcome to Griffon

This is the official repo of the Griffon series (v1 & v2). Griffon is the first high-resolution (over 1K) LVLM capable of localizing everything you are interested in describing the region you specify. In the latest version, Griffon supports visual-language co-referring. You can input an image or some descriptions. Griffon achieves excellent performance in REC, object detection, object counting, visual/phrase grounding, and REG.

Griffon: Spelling out All Object Locations at Any Granuality with Large Language Model

📕Paper 🌀Usage 🤗Model

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

📕Paper

Griffon-G with More General, More Tasks, and Better Performance!

Coming in a few days!

News

2024.07.01 🔥Griffon has been accepted to ECCV 2024.
2024.03.15 🔥Griffon v2's paper has been released in 📕Arxiv.
2024.03.11 🔥We are excited to announce the arrival of Griffon v2. Griffion v2 brings fine-grained perception performance to new heights with high-resolution expert-level detection and counting and supports visual-language co-referring. Take a look at our demo first. Paper, codes, demos, and models will be released soon.
2023.12.13 🔥Ready to release the Language-prompted Localization Dataset after final approval in 🤗HuggingFace.
2023.12.06 🔥Release the inference code and model in 🤗HuggingFace.
2023.11.29 🔥Paper has been released in 📕Arxiv.

What can Griffon do now?

Griffon v2 can perform localization with free-form text inputs and visual target inputs with locally cropped images now, supporting the tasks shown below. More quantitative evaluation results can be found in our paper.

Acknowledgement

LLaVA provides the base codes and pre-trained models.
Shikra provides insight of how to organize datasets and some base processed annotations.
Llama provides the large language model.
volgachen provides the basic environment setting config.

Citation

If you find Griffon useful for your research and applications, please cite using this BibTeX:

@misc{zhan2023griffon,
      title={Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models}, 
      author={Yufei Zhan and Yousong Zhu and Zhiyang Chen and Fan Yang and Ming Tang and Jinqiao Wang},
      year={2023},
      eprint={2311.14552},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{zhan2024griffon,
      title={Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring}, 
      author={Yufei Zhan and Yousong Zhu and Hongyin Zhao and Fan Yang and Ming Tang and Jinqiao Wang},
      year={2024},
      eprint={2403.09333},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

The data and checkpoint is licensed for research use only. All of them are also restricted to uses that follow the license agreement of LLaVA, LLaMA and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
demo		demo
inference_out		inference_out
llava		llava
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_v1.md		README_v1.md
Times New Roman.ttf		Times New Roman.ttf
conda_env.yaml		conda_env.yaml
demo.jpg		demo.jpg
demov2.jpg		demov2.jpg
logo.jpg		logo.jpg
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to Griffon

News

What can Griffon do now?

Acknowledgement

Citation

License

About

Releases

Packages

Languages

License

jefferyZhan/Griffon

Folders and files

Latest commit

History

Repository files navigation

Welcome to Griffon

News

What can Griffon do now?

Acknowledgement

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages