TotalVariation

Follow

Xin Cai TotalVariation

Follow

Looking for RA or funded PhD opportunities. Passionate about computer vision & deep learning.

23 followers · 419 following

Achievements

Achievements

Block or Report

Block or report TotalVariation

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Lists (26)

Sort

VLMs

VFMs

VAEs&NFs

UVideoDA

TimeSeriesAnalysis

TestTimeAdaptation

SelfSupervisedLearning

ResearchAssist

RemoteSensing

OpenVocabLearning

13 repositories

OpenSet

OpenBlackBox

OODDetection

MultimodalLearning

21 repositories

MLSys

LLMs+Tools

GANs

EfficientAttention

DRL

DomainAdaptation

10 repositories

DiffusionModels

23 repositories

CVinW

ContrastiveLearning

ContinualLearning

AudioVisualLearning

ActiveLearning

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

opendilab / awesome-decision-transformer

A curated list of Decision Transformer resources (continually updated)

637 25 Updated Jul 7, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,577 96 Updated Jul 6, 2024

MathFoundationRL / Book-Mathematical-Foundation-of-Reinforcement-Learning

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 2,621 355 Updated Jul 16, 2024

TencentARC / Open-MAGVIT2

Open-MAGVIT2: Democratizing Autoregressive Visual Generation

Python 330 10 Updated Jul 10, 2024

cvlab-columbia / viper

Code for the paper "ViperGPT: Visual Inference via Python Execution for Reasoning"

Jupyter Notebook 1,639 117 Updated Jan 29, 2024

deepcs233 / Visual-CoT

Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning

Python 76 5 Updated Jul 6, 2024

ttengwang / Caption-Anything

Caption-Anything is a versatile tool combining image segmentation, visual captioning, and ChatGPT, generating tailored captions with diverse controls for user preferences. https://huggingface.co/sp…

Python 1,637 102 Updated Aug 29, 2023

DirtyHarryLYL / LLM-in-Vision

Recent LLM-based CV and related works. Welcome to comment/contribute!

801 33 Updated Jun 5, 2024

cuda-mode / awesomeMLSys

An ML Systems Onboarding list

152 5 Updated Jul 19, 2024

lllyasviel / Omost

Your image is almost there!

Python 6,935 405 Updated Jul 14, 2024

OpenBMB / MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Python 8,036 565 Updated Jul 19, 2024

xxxnell / how-do-vits-work

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"

Python 800 77 Updated Jul 14, 2022

mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 34,466 3,604 Updated Jul 16, 2024

OpenGVLab / all-seeing

[ICLR 2024] This is the official implementation of the paper "The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World"

Python 425 13 Updated Jul 11, 2024

mbzuai-oryx / LLaVA-pp

🔥🔥 LLaVA++: Extending LLaVA with Phi-3 and LLaMA-3 (LLaVA LLaMA-3, LLaVA Phi-3)

Python 754 53 Updated Jul 10, 2024

OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Python 2,890 235 Updated Jul 5, 2024

apple / ml-ferret

Python 8,224 480 Updated Jan 27, 2024

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的可商用开源多模态对话模型

Python 4,300 327 Updated Jul 22, 2024

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

HTML 221 15 Updated Jul 8, 2024

FreedomIntelligence / ALLaVA

Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model

Python 223 7 Updated Jun 25, 2024

dvlab-research / MGM

Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"

Python 3,102 277 Updated May 4, 2024

NExT-GPT / NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Python 3,094 313 Updated Jan 22, 2024

baaivision / Emu

Emu Series: Generative Multimodal Models from BAAI

Python 1,576 81 Updated Mar 8, 2024

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

10,777 714 Updated Jul 22, 2024

mlfoundations / open_clip

An open source implementation of CLIP.

Python 9,286 922 Updated Jul 4, 2024

SHI-Labs / Versatile-Diffusion

Versatile Diffusion: Text, Images and Variations All in One Diffusion Model, arXiv 2022 / ICCV 2023

Python 1,302 80 Updated Aug 10, 2023

thu-ml / unidiffuser

Code and models for the paper "One Transformer Fits All Distributions in Multi-Modal Diffusion"

Python 1,336 87 Updated May 31, 2023

microsoft / i-Code

Jupyter Notebook 1,657 161 Updated Apr 18, 2024

yuval-alaluf / Attend-and-Excite

Official Implementation for "Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models" (SIGGRAPH 2023)

Jupyter Notebook 663 58 Updated Jan 26, 2024

lucidrains / DALLE2-pytorch

Implementation of DALL-E 2, OpenAI's updated text-to-image synthesis neural network, in Pytorch

Python 10,977 1,068 Updated May 11, 2024