Skip to content
View TotalVariation's full-sized avatar
Block or Report

Block or report TotalVariation

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Python 216 3 Updated Jun 18, 2024

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

Jupyter Notebook 1,627 115 Updated Jan 29, 2024

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 59 5 Updated Jun 13, 2024

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,625 100 Updated Aug 29, 2023

Recent LLM-based CV and related works. Welcome to comment/contribute!

771 32 Updated Jun 5, 2024

An ML Systems Onboarding list

113 4 Updated May 22, 2024

Your image is almost there!

Python 6,516 397 Updated Jun 8, 2024

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Python 7,426 514 Updated Jun 19, 2024

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"

Python 797 77 Updated Jul 14, 2022

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 33,281 3,459 Updated Jun 11, 2024

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

Python 402 13 Updated Jun 18, 2024

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python 718 48 Updated May 3, 2024

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 2,821 228 Updated Jun 19, 2024
Python 8,163 475 Updated Jan 27, 2024

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Python 3,632 280 Updated Jun 20, 2024

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 174 13 Updated Jun 15, 2024

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Python 199 6 Updated Mar 5, 2024

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,057 275 Updated May 4, 2024

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,011 306 Updated Jan 22, 2024

Emu Series: Generative Multimodal Models from BAAI

Python 1,550 79 Updated Mar 8, 2024

✨✨Latest Advances on Multimodal Large Language Models

10,118 669 Updated Jun 20, 2024

An open source implementation of CLIP.

Jupyter Notebook 8,978 894 Updated Jun 18, 2024

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023

Python 1,300 80 Updated Aug 10, 2023

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

Python 1,321 85 Updated May 31, 2023
Jupyter Notebook 1,647 162 Updated Apr 18, 2024

Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)

Jupyter Notebook 650 55 Updated Jan 26, 2024

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

Python 10,932 1,058 Updated May 11, 2024

Official PyTorch implementation of PDAE (NeurIPS 2022)

Python 265 18 Updated Mar 5, 2024

Open-Sora: Democratizing Efficient Video Production for All

Python 18,796 1,803 Updated Jun 20, 2024

Repository for the paper "Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images"

Python 429 84 Updated Apr 28, 2023
Next