Build software better, together

haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Jul 14, 2024
Python

Fanghua-Yu / SUPIR

Star

SUPIR aims at developing Practical Algorithms for Photo-Realistic Image Restoration In the Wild

deep-learning pytorch super-resolution restoration diffusion-models pytorch-lightning stable-diffusion llava sdxl

Updated Jun 13, 2024
Python

InternLM / xtuner

Star

An efficient, flexible and full-featured toolkit for fine-tuning LLM (InternLM2, Llama3, Phi3, Qwen, Mistral, ...)

agent chatbot conversational-ai peft baichuan msagent large-language-models llm supervised-finetuning llava llm-training chatglm2 internlm llama2 qwen chatglm3 mixtral llama3 phi3

Updated Jul 17, 2024
Python

modelscope / swift

Star

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

agent deploy llama lora gemma peft multimodal sft dpo pre-training awq llm modelscope llava ollama qwen2 unsloth llama3 glm4 internvl

Updated Jul 18, 2024
Python

modelscope / data-juicer

Star

A one-stop data processing system to make data higher-quality, juicier, and more digestible for (multimodal) LLMs! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷为大模型提供更高质量、更丰富、更易”消化“的数据！

Updated Jul 18, 2024
Python

Paddle Multimodal Integration and eXploration, supporting mainstream multi-modal tasks, including end-to-end large-scale multi-modal pretrain models and diffusion model toolbox. Equipped with high performance and flexibility.

image-to-text clip text-to-image dit multimodal sora text-to-video aigc stable-diffusion controlnet llava blip2 minigpt4 sd-xl ppdiffusers eva-clip stablevideodiffusion qwen-vl

Updated Jul 18, 2024
Python

mbzuai-oryx / Video-ChatGPT

Star

[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.

chatbot llama clip mulit-modal vision-language vicuna gpt-4 vision-language-pretraining llava video-chatboat video-conversation

Updated Jun 16, 2024
Python

open-compass / VLMEvalKit

Star

Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 50+ HF models, 20+ benchmarks

computer-vision evaluation pytorch gemini openai vqa vit gpt multi-modal clip claude openai-api gpt4 large-language-models llm chatgpt llava qwen gpt-4v

Updated Jul 18, 2024
Python

roboflow / multimodal-maestro

Star

Effective prompting for Large Multimodal Models like GPT-4 Vision, LLaVA or CogVLM. 🔥

object-detection cross-modal multimodality instance-segmentation lmm gpt-4 visual-prompting prompt-engineering vision-language-model llava segment-anything gpt-4-vision

Updated Feb 13, 2024
Python

apocas / restai

Sponsor

Star

RestAI is an AIaaS (AI as a Service) open-source platform. Built on top of LlamaIndex, Ollama and HF Pipelines. Supports any public LLM supported by LlamaIndex and any local LLM suported by Ollama. Precise embeddings usage and tuning.

python transformers embeddings openai llama rag fastapi llm stable-diffusion langchain openaiapi llava llamaindex ollama

Updated Jul 15, 2024
Python

unum-cloud / uform

Star

Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️

Updated May 29, 2024
Python

mbzuai-oryx / LLaVA-pp

Star

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

conversation lmms vision-language llm llava llama3 phi3 llava-llama3 llava-phi3 llama3-llava phi3-llava llama-3-vision phi3-vision llama-3-llava phi-3-llava llama3-vision phi-3-vision

Updated Jul 10, 2024
Python

jakobdylanc / discord-llm-chatbot

Sponsor

Star

llmcord.py • Talk to LLMs with your friends!

bot ai discord chatbot openai llama gpt streamed clyde gpt-4 llm chatgpt llava llamacpp oobabooga ollama litellm llmcord llama3 gpt-4o

Updated Jul 17, 2024
Python

TinyLLaVA / TinyLLaVA_Factory

Star

A Framework of Small-scale Large Multimodal Models

nlp transformers llama vision-language llava large-multimodal-models tinyllama

Updated Jul 17, 2024
Python

SkalskiP / awesome-foundation-and-multimodal-models

Sponsor

Star

👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]

nlp computer-vision image-captioning clip blip multimodal zero-shot-detection foundational-models llava segment-anything open-vocabulary-detection open-vocabulary-segmentation grounding-dino

Updated Feb 29, 2024
Python

jhc13 / taggui

Star

Tag manager and captioner for image datasets

image-captioning image-tagging tag-manager pyside6 stable-diffusion llava moondream cogvlm

Updated Jun 29, 2024
Python

gokayfem / ComfyUI_VLM_nodes

Star

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation