Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Python 3,401 292 Updated Sep 14, 2024

Kwai-Kolors / Kolors

Kolors Team

Python 3,510 224 Updated Sep 4, 2024

huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.

Python 25,131 5,191 Updated Sep 14, 2024

THUDM / ChatGLM3

ChatGLM3 series: Open Bilingual Chat LLMs | 开源双语对话语言模型

Python 13,325 1,546 Updated Jul 10, 2024

HJYao00 / DenseConnector

Dense Connector for MLLMs

Python 98 3 Updated Aug 19, 2024

ByZ0e / AI2Thor_keyboard_player

AI2-THOR Data Collection Tool Based On Keyboard Interaction

Python 54 10 Updated Jun 21, 2024

NVlabs / VILA

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,782 140 Updated Sep 10, 2024

WangWenhao0716 / PEICD

[IJCV 2024] The official implementation of "Pattern-Expandable Image Copy Detection"

Python 5 Updated Jul 13, 2024

ByZ0e / HSTT

Python 3 Updated Jul 16, 2024

Atten4Vis / LW-DETR

This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".

Python 202 11 Updated Jul 25, 2024

baaivision / DenseFusion

DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception

Python 103 1 Updated Aug 23, 2024

TencentQQGYLab / ELLA

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Python 1,047 54 Updated Jul 17, 2024

THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型

Python 4,594 363 Updated Sep 11, 2024

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 5,861 401 Updated May 29, 2024

yuweihao / MM-Vet

MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)

Python 252 10 Updated Aug 28, 2024

AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

Python 4,350 419 Updated Jul 30, 2024

Tencent / HunyuanDiT

Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Python 3,278 281 Updated Aug 15, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

Python 5,479 425 Updated Sep 10, 2024

LLaVA-VL / LLaVA-NeXT

Python 2,386 165 Updated Sep 14, 2024

FoundationVision / GLEE

[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale

Python 1,025 82 Updated Aug 8, 2024

Traffic-X / ViT-CoMer

Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.

Python 191 12 Updated Jul 3, 2024

xinyu1205 / recognize-anything

Open-source and strong foundation image recognition models.

Jupyter Notebook 2,742 265 Updated Aug 1, 2024

jianzhnie / awesome-text-to-video

A Survey on Text-to-Video Generation/Synthesis.

565 74 Updated Jul 24, 2024

Yongxing Dai SikaStar

Lists (28)

3D

Agent

Base Model

CLIP

Detection

DG

Diffusion

Foundational-Models

Generative Model

GPT

job

LLM

Multi-modal

Multi-modal prompts

OOD

PEFT

Pretrain

Prompt

Segmentation

Self-supervised

Semi-supervised

T2I&T2V

Transfer Learning

Transformer

UDA

生活

科研工具

资源

Starred repositories

domain-adaptation

cosface

mutual-information