GitHub - LinMu7177/Osprey: The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Demo username & password: osprey

A part of Along the River During the Qingming Festival (清明上河图)

Spirited Away (千与千寻)

Updates 📌

[2023/12/29]🔥 We released the training code and Osprey-724K dataset.

[2023/12/18]🔥 We released the code, osprey-7b model and online demo for Osprey.

What is Osprey 👀

Osprey is a mask-text instruction tuning approach that extends MLLMs by incorporating pixel-wise mask regions into language instructions, enabling fine-grained visual understanding. Based on input mask region, Osprey generate the semantic descriptions including short description and detailed description.

Our Osprey can seamlessly integrate with SAM in point-prompt, box-prompt and segmentation everything modes to generate the semantics associated with specific parts or objects.

Watch Video Demo 🎥

Try Our Demo 🕹️

Online demo

Click 👇 to try our demo online.

web demo

username: osprey
password: osprey

Point
Box
Everything

Offline demo

💻 requirments: For this demo, it needs about 17GB GPU memory for Osprey(15GB) and SAM(2GB).

First install Gradio-Osprey-Demo.
Install Segment Anything.

pip install git+https://github.com/facebookresearch/segment-anything.git

Download ViT-B SAM model to checkpoints.
Run app.py.

cd demo
python app.py --model checkpoint/osprey_7b

Install 🛠️

Clone this repository and navigate to Osprey folder

git clone https://github.com/CircleRadon/Osprey.git
cd Osprey

Install packages

conda create -n osprey python=3.10 -y
conda activate osprey
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Dataset 🌟

The all datasets for training can be found in Dataset preparation.

Osprey-724K: 🤗Hugging Face

Osprey-724K is an instruction dataset with mask-text pairs, containing around 724K GPT-generated multimodal dialogues to encourage MLLMs for fine-grained pixel-level image understanding. It contains object-level, part-level and additional instruction samples for robustness and flexibility.

Training 🚀

Stage1: Image-Text Alignment Pre-training
- The pretrained projector weights for Convnext-large-CLIP can be found in projector weights.
Stage2: Mask-Text Alignment Pre-training
- Download vicuna-7b-v1.5.
- Download projector weights trained in stage1: projector weights.
- Set model_name_or_path in stage2.sh to the path of vicuna-7b-v1.5.
- Set pretrain_mm_mlp_adapter in stage2.sh to the path of mm_projector.
- Set vision_tower in stage2.sh to the path of Convnext-large-CLIP-model.
- Run sh scripts/stage2.sh.
Stage3: End-to-End Fine-tuning
- Set model_name_or_path in stage2.sh to the path of stage2 checkpoint.
- Set vision_tower in stage2.sh to the path of Convnext-large-CLIP-model.
- Run sh scripts/stage3.sh.

Checkpoints 🤖

Convnext-large-CLIP-model🤗: model
Osprey-7b model🤗: model

Then change the "mm_vision_tower" in config.json of Osprey-7b model to the path of Convnext-large-CLIP-model.

TODO List 📝

Release the checkpoints, inference codes and demo.
Release the dataset and training scripts.
Release the evaluation code.
Release the code for data generation pipeline.

Acknowledgement 💌

LLaVA-v1.5: the codebase we built upon.
SAM: the demo uses the segmentation result from SAM as the input of Osprey.

BibTeX 🖊️

@misc{Osprey,
  title={Osprey: Pixel Understanding with Visual Instruction Tuning},
  author={Yuqian Yuan, Wentong Li, Jian Liu, Dongqi Tang, Xinjie Luo, Chi Qin, Lei Zhang and Jianke Zhu},
  year={2023},
  eprint={2312.10032},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.idea		.idea
assets		assets
demo		demo
osprey.egg-info		osprey.egg-info
osprey		osprey
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dataset.md		dataset.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Updates 📌

What is Osprey 👀

Watch Video Demo 🎥

Try Our Demo 🕹️

Online demo

Offline demo

Install 🛠️

Dataset 🌟

Training 🚀

Checkpoints 🤖

TODO List 📝

Acknowledgement 💌

BibTeX 🖊️

About

Releases

Packages

Languages

License

LinMu7177/Osprey

Folders and files

Latest commit

History

Repository files navigation

Updates 📌

What is Osprey 👀

Watch Video Demo 🎥

Try Our Demo 🕹️

Online demo

Offline demo

Install 🛠️

Dataset 🌟

Training 🚀

Checkpoints 🤖

TODO List 📝

Acknowledgement 💌

BibTeX 🖊️

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages