Block or Report
Block or report hysts
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (3)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
[ECCV 2024] AnyControl, a multi-control image synthesis model that supports any combination of user provided control signals. 一个支持用户自由输入控制信号的图像生成模型,能够根据多种控制生成自然和谐的结果!
Code release for "Segment Anything without Supervision"
AuraSR: GAN-based Super-Resolution for real-world
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Enjoy the magic of Diffusion models!
[CVPR 2024 Highlight] VGGSfM Visual Geometry Grounded Deep Structure From Motion
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
Long Context Transfer from Language to Vision
[ICML 2024] EvTexture: Event-driven Texture Enhancement for Video Super-Resolution
Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
Official Implementation of CVPR24 highligt paper: Matching Anything by Segmenting Anything
A diffusers pipeline for zero shot stylised portrait creation
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Mu…
From anything to mesh like human artists. Official impl. of "MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers"
Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
Code for "Real3D: Scaling Up Large Reconstruction Models with Real-World Images"
Official implementation of Physics3D: Learning Physical Properties of 3D Gaussians via Video Diffusion
Official code for "Neural Gaffer: Relighting Any Object via Diffusion"
This respository contains the code for SF-V: Single Forward Video Generation Model.
Offical code for the CVPR 2024 Paper: Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Official codebase for Margin-aware Preference Optimization for Aligning Diffusion Models without Reference (MaPO).
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
VideoTetris: Towards Compositional Text-To-Video Generation