Block or Report
Block or report iamxiaoyubei
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
A prize winning solution for DFDC challenge
DeepFaceLab is the leading software for creating deepfakes.
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
A comprehensive benchmark of deepfake detection
[CVPR 2022] Official code for "RegionCLIP: Region-based Language-Image Pretraining"
official implementation of "Interpreting CLIP's Image Representation via Text-Based Decomposition"
This is the official repository for M2UGen
[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"
AppAgent: Multimodal Agents as Smartphone Users, an LLM-based multimodal agent framework designed to operate smartphone apps.
PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"
Recent LLM-based CV and related works. Welcome to comment/contribute!
Implementation for "DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations" (NeurIPS 2022))
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
ImaginaryNet: Learning Object Detectors without Real Images and Annotations
Image to prompt with BLIP and CLIP
SD.Next: Advanced Implementation of Stable Diffusion and other Diffusion-based generative image models
Stable Diffusion web UI
A playbook for systematically maximizing the performance of deep learning models.
GLIDE: a diffusion-based text-conditional image synthesis model
Multiple Stable Diffusion Projects.
Is synthetic data from generative models ready for image recognition?