Block or Report
Block or report zhang0jhon
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything - Automatically Detect , Segment and Generate Anything
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
Collection of AWESOME vision-language models for vision tasks
A collection of resources and papers on Diffusion Models
ICLR2024 Spotlight: curation/training code, metadata, distribution and pre-trained models for MetaCLIP; CVPR 2024: MoDE: CLIP Data Experts via Clustering
EVA Series: Visual Representation Fantasies from BAAI
MiniCPM-2B: An end-side LLM outperforming Llama2-13B.
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Open-source and strong foundation image recognition models.
[ECCV2024] API code for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
Clone a voice in 5 seconds to generate arbitrary speech in real-time
Next generation face swapper and enhancer
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Character Animation (AnimateAnyone, Face Reenactment)
Emote Portrait Alive: Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions
[CVPR2024] DisCo: Referring Human Dance Generation in Real World
Fine-Grained Open Domain Image Animation with Motion Guidance
VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models
[ECCV 2024] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
[ICLR 22] Latent Image Animator: Learning to Animate Images via Latent Space Navigation
Animating Arbitrary Objects via Deep Motion Transfer
Code and documents of LongLoRA and LongAlpaca (ICLR 2024 Oral)
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming