DINOv2 is a new Vision Transformer (ViT) by Meta AI trained in a self-supervised fashion on a highly curated dataset of 142 million images.
This folder contains demo notebooks to showcase how to fine-tune the model on custom data for image classification + semantic segmentation.
Interested in doing depth estimation with DINOv2? See this notebook, which adds a DPT head to a DINOv2 backbone.
The DINOv2 docs can be found here.
See also this thread for more info regarding using DINOv2: facebookresearch/dinov2#153