Block or Report
Block or report TotalVariation
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (25)
Sort Name ascending (A-Z)
ActiveLearning
AudioVisualLearning
ContinualLearning
ContrastiveLearning
CVinW
DiffusionModels
DomainAdaptation
EfficientAttention
GANs
LLMs+Tools
MLSys
MultimodalLearning
OODDetection
OpenBlackBox
OpenSet
OpenVocabLearning
RemoteSensing
ResearchAssist
SelfSupervisedLearning
TestTimeAdaptation
TimeSeriesAnalysis
UVideoDA
VAEs&NFs
VFMs
VLMs
Stars
Language
Sort by: Recently starred
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…
Recent LLM-based CV and related works. Welcome to comment/contribute!
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Emu Series: Generative Multimodal Models from BAAI
✨✨Latest Advances on Multimodal Large Language Models
An open source implementation of CLIP.
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch
Official PyTorch implementation of PDAE (NeurIPS 2022)
Open-Sora: Democratizing Efficient Video Production for All
Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images"