Skip to content

πŸ”₯πŸ”₯ LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Notifications You must be signed in to change notification settings

mmaaz60/LLaVA-pp

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

LLaVA++: Extending Visual Capabilities with LLaMA-3 and Phi-3

Oryx Models

* Equally contributing first authors

Mohamed bin Zayed University of AI


πŸ“’ Latest Updates

  • Apr-26-24- Phi-3-V and LLaVA-3-V released: Excited to release the new integration of LLaVA with Phi-3 Mini Instruct and LLaMA-3 Instruct models! πŸ”₯πŸ”₯πŸ”₯

πŸ’¬ Introduction

This repository enhances the capabilities of the LLaVA 1.5 model, incorporating latest LLMs released this weakπŸ”₯, Phi-3 Mini Instruct 3.8B, and LLaMA-3 Instruct 8B.

πŸ† Results: Phi-3-V and LLaVA-3-V

Comparison on Benchmarks for Instruction-following LMMS & academic-task-oriented datasets:

Model MMMU POPE MME MMBench-en MMBench-cn SEED-all SEED-img SEED-vid LLaVA-Wild GQA Science-QA Average
LLaVA-v1.5-7B 35.4 85.8 1510.7 64.3 58.3 58.6 66.1 37.3 65.4 62.0 66.8 58.9
LLaVA-v1.5-13B 36.4 85.9 1531.3 67.7 63.6 61.6 68.2 42.7 72.5 63.3 71.6 62.3
Phi-3-V-mini-3.8B 37.8 85.6 1470.1 68.2 68.1 62.8 67.7 44.5 70.9 61.7 80.7 63.2

🌟 LLaMA-3-V-8B results and models - coming soon!

*Average computed excluding MME

πŸ€– Model-Zoo

The following table provides an overview of the available models in our zoo. For each model, you can find links to its Hugging Face page.

Model Name Hugging Face Link Summary
LLaVA-Phi-3-mini-4k-instruct-pretrain HF Pretrained on LCS-558K.
LLaVA-Phi-3-mini-4k-instruct-lora HF LoRA weights fine-tuned on LLaVA-Instruct-665K
LLaVA-Phi-3-mini-4k-instruct HF Merged weights in HuggingFace format.

Installation

git clone https://github.com/mbzuai-oryx/LLaVA-pp.git
cd LLaVA-pp
git submodule update --init --recursive

Packages you need to update from LLAVA:

pip install git+https://github.com/huggingface/transformers@a98c41798cf6ed99e1ff17e3792d6e06a2ff2ff3

πŸš€ Phi-3-V

To integrate Phi-3-V with LLaVA, follow these steps to update the codebase:

# Copy necessary files
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/llava_phi3.py LLaVA/llava/model/language_model/llava_phi3.py
cp Phi-3-V/builder.py LLaVA/llava/model/builder.py
cp Phi-3-V/model__init__.py LLaVA/llava/model/__init__.py
cp Phi-3-V/main__init__.py LLaVA/llava/__init__.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py

# Training commands
cp scripts/Phi3-V_pretrain.sh LLaVA/Vi-phi3_pretrain.sh
cp scripts/Phi3-V_finetune_lora.sh LLaVA/Vi-phi3_finetune_lora.sh

Train Phi-3-V

  1. Pre-train
cd LLaVA
bash Phi3-V_pretrain.sh
  1. Finetune
cd LLaVA
bash Phi3-V_finetune_lora.sh

πŸš€ LLaMA-3-V

To integrate LLaMA-3-V with LLaVA, follow these steps to update the codebase:

# Copy necessary files
cp Phi-3-V/train.py LLaVA/llava/train/train.py
cp Phi-3-V/conversation.py LLaVA/llava/conversation.py

# Training commands
cp scripts/LLaMA3-V_pretrain.sh LLaVA/LLaMA3-V_pretrain.sh
cp scripts/LLaMA3-V_finetune_lora.sh LLaVA/LLaMA3-V_finetune_lora.sh

Train LLaMA-3-V

  1. Pre-train
cd LLaVA
bash LLaMA3-V_pretrain.sh
  1. Finetune
cd LLaVA
bash LLaMA3-V_finetune_lora.sh

πŸ™ Acknowledgement

We are thankful to LLaVA, and lmms-eval for releasing their models and code as open-source contributions.


About

πŸ”₯πŸ”₯ LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.3%
  • Shell 3.7%