Block or Report
Block or report TotalVariation
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (26)
Sort Name descending (Z-A)
VLMs
VFMs
VAEs&NFs
UVideoDA
TimeSeriesAnalysis
TestTimeAdaptation
SelfSupervisedLearning
ResearchAssist
RemoteSensing
OpenVocabLearning
OpenSet
OpenBlackBox
OODDetection
MultimodalLearning
MLSys
LLMs+Tools
GANs
EfficientAttention
DRL
DomainAdaptation
DiffusionModels
CVinW
ContrastiveLearning
ContinualLearning
AudioVisualLearning
ActiveLearning
Stars
Language
Sort by: Recently starred
A curated list of Decision Transformer resources (continually updated)
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
Open-MAGVIT2: Democratizing Autoregressive Visual Generation
Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"
Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…
Recent LLM-based CV and related works. Welcome to comment/contribute!
MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone
(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"
🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型
🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).
Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Emu Series: Generative Multimodal Models from BAAI
✨✨Latest Advances on Multimodal Large Language Models
An open source implementation of CLIP.
Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023
Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"
Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)
Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch