Stars
Official Pytorch Implementation of Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
An open-source impl. of Large Reconstruction Models
Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation
Official Implement of the work "Coherent and Multi-modality Image Inpainting via Latent Space Optimization"
Official implementation of `Splatter Image: Ultra-Fast Single-View 3D Reconstruction' CVPR 2024
✨✨Latest Advances on Multimodal Large Language Models
This repo contains the official implementation of ICLR 2024 paper "Is ImageNet worth 1 video? Learning strong image encoders from 1 long unlabelled video""
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
[Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning
Unsupervised Semantic Segmentation by Distilling Feature Correspondences
[ICLR2024 Spotlight] Code Release of CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
(TPAMI 2024) A Survey on Open Vocabulary Learning
Collection of AWESOME vision-language models for vision tasks
Official Implementation of "CAT-Seg🐱: Cost Aggregation for Open-Vocabulary Semantic Segmentation"
PyTorch code and models for the DINOv2 self-supervised learning method.
This repository contains demos I made with the Transformers library by HuggingFace.
List of Computer Science courses with video lectures.
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
[CVPR 2023] Official repository for paper "Stare at What You See: Masked Image Modeling without Reconstruction"
Code for **Spatiotemporal Self-supervised Learning for Point Clouds in the Wild** (STSSL) CVPR2023
Official codebase for I-JEPA, the Image-based Joint-Embedding Predictive Architecture. First outlined in the CVPR paper, "Self-supervised learning from images with a joint-embedding predictive arch…
[ICLR'23 Spotlight🔥] The first successful BERT/MAE-style pretraining on any convolutional network; Pytorch impl. of "Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling"
Code for Point-Level Regin Contrast (https//arxiv.org/abs/2202.04639)
Scaling and Benchmarking Self-Supervised Visual Representation Learning
[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning
cvpr2024/cvpr2023/cvpr2022/cvpr2021/cvpr2020/cvpr2019/cvpr2018/cvpr2017 论文/代码/解读/直播合集,极市团队整理