Skip to content

Commit

Permalink
Merge branch 'inference' of https://github.com/LLaVA-VL/LLaVA-NeXT in…
Browse files Browse the repository at this point in the history
…to inference
  • Loading branch information
ZhangYuanhan-AI committed May 30, 2024
2 parents 22103f3 + 19ddd2e commit 2cdee9f
Show file tree
Hide file tree
Showing 6 changed files with 55 additions and 12 deletions.
41 changes: 37 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,14 +6,13 @@
[![llava_next-blog](https://img.shields.io/badge/llava_next-blog-green)](https://llava-vl.github.io/blog/)
[![llava_next-demo](https://img.shields.io/badge/llava_next-image_demo-red)](https://llava-next.lmms-lab.com/)
[![llava_next-video_demo](https://img.shields.io/badge/llava_next-video_demo-red)](https://llavanext-video.lmms-lab.com/)
[![llava_next-image_checkpoints](https://img.shields.io/badge/llava_next-image_checkpoints-blue)](https://huggingface.co/collections/lmms-lab/llava-next-6623288e2d61edba3ddbf5ff)
[![llava_next-image_checkpoints](https://img.shields.io/badge/llava_next-image_checkpoints-blue)](https://huggingface.co/lmms-lab)
[![llava_next-video_checkpoints](https://img.shields.io/badge/llava_next-video_checkpoints-blue)](https://huggingface.co/collections/lmms-lab/llava-next-video-661e86f5e8dabc3ff793c944)

## Release
- [2024/05/10] 🔥 **LLaVA-NeXT** (Stronger) models are released, with support of stronger LMM inlcuding LLama-3 (8B) and Qwen-1.5 (72B/110B) Check out [[blog](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/)] and [[checkpoints](https://huggingface.co/collections/lmms-lab/llava-next-6623288e2d61edba3ddbf5ff)] to see improved performance!
- [2024/05/10] 🔥 **LLaVA-NeXT** (Video) is released. The image-only-trained LLaVA-NeXT model is surprisingly strong on video tasks with zero-shot modality transfer. DPO training with AI feedback on videos can yield significant improvement. [[Blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)] and [[checkpoints](https://huggingface.co/collections/lmms-lab/llava-next-video-661e86f5e8dabc3ff793c944)]
- [2024/05/10] 🔥 **LLaVA-NeXT** (Stronger) models are released, with support of stronger LMM inlcuding LLama-3 (8B) and Qwen-1.5 (72B/110B) Check out [[blog](https://llava-vl.github.io/blog/2024-05-10-llava-next-stronger-llms/)] and [[checkpoints](https://huggingface.co/lmms-lab)] to see improved performance!
- [2024/05/10] 🔥 **LLaVA-NeXT** (Video) is released. The image-only-trained LLaVA-NeXT model is surprisingly strong on video tasks with zero-shot modality transfer. DPO training with AI feedback on videos can yield significant improvement. [[Blog](https://llava-vl.github.io/blog/2024-04-30-llava-next-video/)], [[checkpoints](https://huggingface.co/collections/lmms-lab/llava-next-video-661e86f5e8dabc3ff793c944)] and [[sglang](https://github.com/sgl-project/sglang)]
- [2024/01/30] 🔥 **LLaVA-NeXT** is out! With additional scaling to LLaVA-1.5, LLaVA-NeXT-34B outperforms Gemini Pro on some benchmarks. It can now process 4x more pixels and perform more tasks/applications than before. Check out the [blog post](https://llava-vl.github.io/blog/2024-01-30-llava-next/), and explore the [demo](https://llava.hliu.cc/)! Models are available in [Model Zoo](https://github.com/haotian-liu/LLaVA/blob/main/docs/MODEL_ZOO.md). Training/eval data and scripts coming soon.

<details>
<summary>More</summary>

Expand Down Expand Up @@ -78,6 +77,40 @@ Please checkout the following page for more inference & evaluation details.
#### - LLaVA-NeXT: A Strong Zero-shot Video Understanding Model
- [LLaVA-NeXT-Video](./docs/LLaVA-NeXT-Video.md): for video inference and evaluation scripts.


## SGLang for SpeedUp Inference and Deployment

We use [SGLang](https://github.com/sgl-project/sglang) to speed up inference and deployment of LLaVA-NeXT. You could make LLaVA-NeXT as a backend API service with SGLang.

**Prepare Environment**:
Following the instruction in the [sglang](https://github.com/sgl-project/sglang?tab=readme-ov-file#install)

### LLaVA-NeXT (Image)

Checkout the HTTP Post/Get and SRT usage at [sglang/examples/usage/llava](https://github.com/sgl-project/sglang/blob/main/examples/usage/llava)

### LLaVA-NeXT (Video)

**Launch and Run on (K) Nodes**:
- Go to sglang project
```
cd PATH_TO/sglang
```
- First node:
```sh
bash examples/usage/llava_video/srt_example_llava_v.sh K 0 YOUR_VIDEO_PATH YOUR_MODEL_PATH FRAMES_PER_VIDEO
(e.g. bash examples/usage/llava_video/srt_example_llava_v.sh K 0 examples/usage/llava_video/videos/Q98Z4OTh8RwmDonc.mp4 lmms-lab/LLaVA-NeXT-Video-7B-DPO 16)
```
- Second node:
```sh
bash examples/usage/llava_video/srt_example_llava_v.sh K 1 YOUR_VIDEO_PATH YOUR_MODEL_PATH FRAMES_PER_VIDEO
```
- The K node:
```sh
bash examples/usage/llava_video/srt_example_llava_v.sh K K-1 YOUR_VIDEO_PATH YOUR_MODEL_PATH FRAMES_PER_VIDEO
```


## Citation

If you find it useful for your research and applications, please cite related papers/blogs using this BibTeX:
Expand Down
1 change: 1 addition & 0 deletions docs/LLaVA-NeXT.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# LLaVA-NeXT: Stronger LLMs Supercharge Multimodal Capabilities in the Wild

## Quick Start With HuggingFace
First please install our repo with code and environments: `pip install git+https://github.com/LLaVA-VL/LLaVA-NeXT.git`

Here is a quick inference code using [`llavanext-llama3-8B`](https://huggingface.co/lmms-lab/llama3-llava-next-8b) as an example. You will need to install [`flash-attn`](https://github.com/Dao-AILab/flash-attention) to use this code snippet. If you don't want to install it, you can set `attn_implementation=None` when load_pretrained_model
```python
Expand Down
8 changes: 7 additions & 1 deletion llava/conversation.py
Original file line number Diff line number Diff line change
Expand Up @@ -348,6 +348,12 @@ def dict(self):
sep2="</s>",
)

try:
llama3_tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct")
except Exception as e:
print("Error loading llama3 tokenizer")
print(e)

conv_llava_llama_3 = Conversation(
system="You are a helpful language and vision assistant. " "You are able to understand the visual content that the user provides, " "and assist the user with a variety of tasks using natural language.",
roles=("<|start_header_id|>user", "<|start_header_id|>assistant"),
Expand All @@ -356,7 +362,7 @@ def dict(self):
offset=0,
sep_style=SeparatorStyle.LLAMA_3,
tokenizer_id="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer=AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-8B-Instruct"),
tokenizer=llama3_tokenizer,
stop_token_ids=[128009],
)

Expand Down
5 changes: 2 additions & 3 deletions llava/model/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,7 @@
try:
exec(f"from .language_model.{model_name} import {model_classes}")
except ImportError:
import traceback

traceback.print_exc()
# import traceback
# traceback.print_exc()
print(f"Failed to import {model_name} from llava.language_model.{model_name}")
pass
4 changes: 4 additions & 0 deletions llava/model/multimodal_encoder/clip_encoder.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,7 @@ def num_patches(self):
if "cls_patch" in self.select_feature:
_num_patches += 1
return _num_patches

@property
def image_size(self):
return self.config.image_size
8 changes: 4 additions & 4 deletions llavavid/model/builder.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, l
tokenizer = AutoTokenizer.from_pretrained(model_base, use_fast=False)
print("Loading LLaVA from base model...")
if "mixtral" in model_name.lower():
model = LlavaMixtralForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, use_flash_attention_2=False, **kwargs)
model = LlavaMixtralForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)
else:
model = LlavaLlamaForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)
token_num, tokem_dim = model.lm_head.out_features, model.lm_head.in_features
Expand Down Expand Up @@ -105,7 +105,7 @@ def load_from_hf(repo_id, filename, subfolder=None):
model = LlavaMptForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
elif "mixtral" in model_name.lower() and "vicuna" not in model_name.lower() and "mistral" not in model_name.lower():
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = LlavaMixtralForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, use_flash_attention_2=True, **kwargs)
model = LlavaMixtralForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, **kwargs)
elif "mistral" in model_name.lower() or "zephyr" in model_name.lower():
tokenizer = AutoTokenizer.from_pretrained(model_path)
cfg_pretrained = AutoConfig.from_pretrained(model_path)
Expand All @@ -114,15 +114,15 @@ def load_from_hf(repo_id, filename, subfolder=None):
print(f"Overwriting config with {overwrite_config}")
for k, v in overwrite_config.items():
setattr(cfg_pretrained, k, v)
model = LlavaMistralForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, use_flash_attention_2=True, config=cfg_pretrained, **kwargs)
model = LlavaMistralForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, config=cfg_pretrained, **kwargs)
else:
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
cfg_pretrained = AutoConfig.from_pretrained(model_path)
if overwrite_config is not None:
print(f"Overwriting config with {overwrite_config}")
for k, v in overwrite_config.items():
setattr(cfg_pretrained, k, v)
model = LlavaLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, use_flash_attention_2=True, config=cfg_pretrained, **kwargs)
model = LlavaLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, config=cfg_pretrained, **kwargs)
else:
# Load language model
if model_base is not None:
Expand Down

0 comments on commit 2cdee9f

Please sign in to comment.