Block or Report
Block or report hysts
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (3)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
[ECCV 2024] Code for VFusion3D: Learning Scalable 3D Generative Models from Video Diffusion Models
👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing
[arXiv 2024] Follow-Your-Emoji: This repo is the official implementation of "Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation"
Implementation of UltraPixel: Advancing Ultra-High-Resolution Image Synthesis to New Peaks
Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various r…
SEED-Story: Multimodal Long Story Generation with Large Language Model
Official PyTorch Implementation of MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Code for FreeTraj, a tuning-free method for trajectory-controllable video generation
Official implementation of Image Conductor: Precision Control for Interactive Video Synthesis
Understand Human Behavior to Align True Needs
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
[ECCV 2024] AnyControl, a multi-control image synthesis model that supports any combination of user provided control signals. 一个支持用户自由输入控制信号的图像生成模型,能够根据多种控制生成自然和谐的结果!
Code release for "Segment Anything without Supervision"
AuraSR: GAN-based Super-Resolution for real-world
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Enjoy the magic of Diffusion models!
[CVPR 2024 Highlight] VGGSfM Visual Geometry Grounded Deep Structure From Motion
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Long Context Transfer from Language to Vision
[ICML 2024] EvTexture: Event-driven Texture Enhancement for Video Super-Resolution