Skip to content

Latest commit

 

History

History
 
 

VisionTransformer

Vision Transformer (ViT)

Notebooks

This directory contains several notebooks that illustrate how to use Google's ViT both for fine-tuning on custom data as well as inference. It currently includes the following notebooks:

  • performing inference with ViT to illustrate image classification
  • fine-tuning ViT on CIFAR-10 using HuggingFace's Trainer
  • fine-tuning ViT on CIFAR-10 using PyTorch Lightning

There's also the official HuggingFace image classification notebook, which can be found here.

Note that these notebooks work for any vision model in the library (i.e. any model supported by the AutoModelForImageClassification API). You can just replace the checkpoint name (like google/vit-base-patch16-224) by another one (like facebook/convnext-tiny-224)

Just pick your favorite vision model from the hub and start fine-tuning it :)

Blog posts

Below, I list some great blog posts explaining how to use ViT:

PyTorch:

Tensorflow/Keras: