[ECCV 2024] AnyControl, a multi-control image synthesis model that supports any combination of user provided control signals. 一个支持用户自由输入控制信号的图像生成模型，能够根据多种控制生成自然和谐的结果！

Python 91 1 Updated Jul 5, 2024

KwaiVGI / LivePortrait

Bring portraits to life!

Python 9,993 967 Updated Aug 14, 2024

ollama / ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.

Go 84,899 6,527 Updated Aug 14, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Python 5,006 388 Updated Aug 14, 2024

FoundationVision / LlamaGen

Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation

Python 1,133 41 Updated Jul 14, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,651 104 Updated Aug 3, 2024

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 6,130 550 Updated Aug 14, 2024

krennic999 / STAR

STAR: Scale-wise Text-to-image generation via Auto-Regressive representations

103 1 Updated Jun 18, 2024

LLaVA-VL / LLaVA-NeXT

Python 1,836 111 Updated Aug 14, 2024

om-ai-lab / RS5M

RS5M: a large-scale vision language dataset for remote sensing

Python 187 7 Updated Jul 31, 2024

karpathy / LLM101n

LLM101n: Let's build a Storyteller

26,941 1,461 Updated Aug 1, 2024

lucidrains / titok-pytorch

Implementation of TiTok, proposed by Bytedance in "An Image is Worth 32 Tokens for Reconstruction and Generation"

Python 157 3 Updated Jun 20, 2024

Luo-Z13 / SkySenseGPT

A Fine-Grained Instruction Tuning Dataset and Model for Remote Sensing Vision-Language Understanding

Python 45 2 Updated Aug 3, 2024

fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation

Python 8,062 1,067 Updated Jul 4, 2024

zytx121 / Awesome-VLGFM

A Survey on Vision-Language Geo-Foundation Models (VLGFMs)

92 5 Updated Jul 19, 2024

OpenGVLab / OmniCorpus

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text

210 4 Updated Jun 16, 2024

bghira / SimpleTuner

A general fine-tuning kit geared toward diffusion models.

Python 1,108 84 Updated Aug 14, 2024

UCSC-VLAA / Recap-DataComp-1B

This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"

109 1 Updated Jun 13, 2024

Jack-bo1220 / Awesome-Remote-Sensing-Foundation-Models

733 74 Updated Aug 8, 2024

minar09 / awesome-virtual-try-on

A curated list of awesome research papers, projects, code, dataset, workshops etc. related to virtual try-on.

2,042 261 Updated Jul 27, 2024

ali-vilab / UniAnimate

Code for Paper "UniAnimate: Taming Unified Video Diﬀusion Models for Consistent Human Image Animation".

Python 638 44 Updated Jul 23, 2024

allenai / objaverse-xl

🪐 Objaverse-XL is a Universe of 10M+ 3D Objects. Contains API Scripts for Downloading and Processing!

Python 681 40 Updated Jul 1, 2024

MingTao(陶明) tobran

Block or report tobran

Lists (4)

T2I-dataset

TGI

TGP

tools

Starred repositories

text-to-image