Highlights
- Pro
Block or Report
Block or report tsujuifu
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Stable-Hair: Real-World Hair Transfer via Diffusion Model
Agent driven automation starting with the web. Discord: https://discord.gg/wgNfmFuqJF
Code repo for "Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding"
OpenDiT: An Easy, Fast and Memory-Efficient System for DiT Training and Inference
VGGSfM: Visual Geometry Grounded Deep Structure From Motion
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
MMSci: A Multimodal Multi-Discipline Dataset for PhD-Level Scientific Comprehension
[ICCV 2023] A latent space for stochastic diffusion models
CVPR-24 | Official codebase for ZONE: Zero-shot InstructiON-guided Local Editing
👔IMAGDressing👔: Interactive Modular Apparel Generation for Virtual Dressing
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various r…
An unofficial implementation of both ViT-VQGAN and RQ-VAE in Pytorch
Ego4D Goal-Step: Toward Hierarchical Understanding of Procedural Activities (NeurIPS 2023)
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. AI拟音大师,给你的无声视频添加生动而且同步的音效 😝
Modern Stable Diffusion models family - Fluently
Long Context Transfer from Language to Vision
This repo contains the code for our paper An Image is Worth 32 Tokens for Reconstruction and Generation
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
Enjoy the magic of Diffusion models!
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
The project page for "LOGIC-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning"
Official implementations for paper: Zero-shot Image Editing with Reference Imitation