Training and Dataset will be released soon. A more powerful model is on the way. We also provide the online demo.
- [2023.12.09] 🤗Hugging Face A Better Model V6.1 are available now! Welcome to watch this repository for the latest updates.
- [2023.12.06] Gradio & CLI Inference Demo are available now.
- [2023.12.01] 🤗Hugging Face Preview Model are available now!
💡 I also have other video-language projects that may interest you ✨.
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu, Shihao Wang, Yuan Fang, Wangpeng An
![]()
![]()
git clone https://github.com/wanghao-cst/Omni-VideoAssistant
cd Omni-VideoAssistant
conda create -n omni python=3.10 -y
conda activate omni
pip install --upgrade pip
pip install -e .
Download for CLI inference only, gradio web UI will download it automatically. Omni Preview Model 6.1
CUDA_VISIBLE_DEVICES=0 python -m llava.serve.gradio_demo
CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
--model-path "path to omni checkpoints" \
--image-file "llava/serve/examples/extreme_ironing.jpg" \
--query "What is unusual about this image?"
CUDA_VISIBLE_DEVICES=0 python -m llava.eval.run_omni \
--model-path "path to omni checkpoints" \
--video-file "llava/serve/examples/0A8CF.mp4" \
--query "Describe the activity in the video"
This work is based on MVCE for unlimited training data generation., LLaVA for pretrained model