Stars
MambaOut: Do We Really Need Mamba for Vision?
GIT: A Generative Image-to-text Transformer for Vision and Language
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[CVPR 2024 Highlight] Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Prompt Learning for Vision-Language Models (IJCV'22, CVPR'22)
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"
A comprehensive collection of awesome research and other items about video domain adaptation
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
推荐系统入门教程,在线阅读地址:https://datawhalechina.github.io/fun-rec/
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
ICCV 2023 Papers: Discover cutting-edge research from ICCV 2023, the leading computer vision conference. Stay updated on the latest in computer vision and deep learning, with code included. ⭐ suppo…
ChatLaw:A Powerful LLM Tailored for Chinese Legal. 中文法律大模型
Reinforcement Learning. Sun Yat-sen University.
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
A faster pytorch implementation of faster r-cnn
CPED: A Large-Scale Chinese Personalized and Emotional Dialogue Dataset for Conversational AI | 中文个性情感对话数据集
[CVPR 2022] Official CoTTA Code for our paper Continual Test-Time Domain Adaptation
[CVPR2023] All in One: Exploring Unified Video-Language Pre-training