Stars
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
Open-Sora: Democratizing Efficient Video Production for All
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
[CVPR 2024] Real-Time Open-Vocabulary Object Detection
Examples and tutorials on using SOTA computer vision models and techniques. Learn everything from old-school ResNet, through YOLO and object-detection transformers like DETR, to the latest models l…
PyTorch implementation for SDEdit: Image Synthesis and Editing with Stochastic Differential Equations
Official PyTorch Implementation of "GAN-Supervised Dense Visual Alignment" (CVPR 2022 Oral, Best Paper Finalist)
Rembg is a tool to remove images background
EfficientViT is a new family of vision models for efficient high-resolution vision.
Foundational Models for State-of-the-Art Speech and Text Translation
State-of-the-art 2D and 3D Face Analysis Project
Official PyTorch implementation of StyleGAN3
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
Taming Transformers for High-Resolution Image Synthesis
CoTracker is a model for tracking any point (pixel) on a video.
Generative Models by Stability AI
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation
Official implementations for paper: Anydoor: zero-shot object-level image customization
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
A collection of papers on the topic of ``Computer Vision in the Wild (CVinW)''
Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)
a state-of-the-art-level open visual language model | 多模态预训练模型
An Open-source Toolkit for LLM Development
[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters