Stars
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations by T. Chen et al.
Official implementation for the paper "Deep ViT Features as Dense Visual Descriptors".
Network Dissection https://netdissect.csail.mit.edu for quantifying interpretability of deep CNNs.
[CVPR 2024] Official Repository for MCPNet: An Interpretable Classifier via Multi-Level Concept Prototypes
This is the pytorch implementation of the paper - Axiomatic Attribution for Deep Networks.
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision"
PyTorch code and models for the DINOv2 self-supervised learning method.
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
Scripts for fine-tuning Meta Llama3 with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supportin…
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
This repository contains demos I made with the Transformers library by HuggingFace.
Notebooks using the Hugging Face libraries 🤗
Datasets, Transforms and Models specific to Computer Vision
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
(ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentation
Light version of Network Dissection for Quantifying Interpretability of Networks
[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Reference implementation for DPO (Direct Preference Optimization)
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
Model explainability that works seamlessly with 🤗 transformers. Explain your transformers model in just 2 lines of code.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch