Learn from Examples

Learn the basics of fastdup through interactive examples. View the notebooks on GitHub or nbviewer. Even better, run them on Google Colab or Kaggle, for free.

	🧹 Clean Image Folder: Learn how to analyze and clean a folder of images from potential issues and export a list of problematic files for further action. If you have an unorganized folder of images, this is a good place to start. 📌 Dataset: Food-101.



	🖼 Analyze Image Classification Dataset: Learn how to load a labeled image classification dataset and analyze for potential issues. If you have labeled ImageNet-style folder structure, have a go! 📌 Dataset: Imagenette.



	🎁 Analyze Object Detection Dataset: Learn how to load bounding box annotations for object detection and analyze for potential issues. If you have a COCO-style labeled object detection dataset, give this example a try. 📌 Dataset: COCO.

Load Data From Sources

The notebooks in this section show how to load data from various sources and analyze them with fastdup.

	🤗 Hugging Face Datasets: Load and analyze datasets from Hugging Face Datasets. Perfect if you already have a dataset hosted on Hugging Face hub. 🔗 Learn More.



	🏆 Kaggle: Load and analyze any computer vision datasets from Kaggle. Get ahead of your competition with data insights. 🔗 Learn More.



	🌎 Roboflow Universe: Load and analyze any computer vision datasets from Roboflow Universe. Analyze any of the 200,000 datasets on Roboflow Universe. 🔗 Learn More.



	📦 Labelbox: Load and analyze vision datasets from Labelbox - A data-centric AI platform for building intelligent applications. 🔗 Learn More.



	🔦 Torchvision Datasets: Load and analyze vision datasets from Torchvision Datasets. 🔗 Learn More.



	💦 Tensorflow Datasets: Load and analyze vision datasets from Tensorflow Datasets. 🔗 Learn More.

Enrich Data Using Foundation Models

The notebooks in this section show how to enrich your visual dataset using various foundation models supported in fastdup.

	🎞 Zero-Shot Classification: Enrich your visual data with zero-shot image classification and tagging models such as Recognize Anything Model, Tag2Text, and more. 🔗 Learn More.



	🧭 Zero-Shot Detection: Enrich your visual data with zero-shot image detection model such as Grounding DINO and more. 🔗 Learn More.



	🎯 Zero-Shot Segmentation: Enrich your visual data with zero-shot image segmentation model such as Segment Anything Model and more. 🔗 Learn More.

Extract Features From Dataset

The notebooks in this section show how to run fastdup on your own embeddings in combination with frameworks like ONNX and PyTorch.

	🧠 TIMM Embeddings: Compute dataset embeddings using TIMM (PyTorch Image Models) and run fastdup over the them to surface dataset issues. Runs on CPU and GPU.



	🦖 DINOv2 Embeddings: Extract feature vectors of your images using DINOv2 model. Runs on CPU.



	➡️ Use Your Own Feature Vectors: Run fastdup on pre-computed feature vectors and surface data quality issues.

Exciting New Features

Note: We're happy to announce new features are out from beta testing and now available to the public, completely free of charge! We invite you to try them out and provide us with your valuable feedback!

	😗 Face Detection in Videos: Use fastdup with a face detection model to detect faces from videos and analyze the cropped faces for potential issues such as duplicates, near-duplicates, outliers, bright/dark/blurry faces.



	🤖 Object Detection in Videos: Use fastdup with a pre-trained YOLOv5 model to detect and analyze objects for potential issues such as duplicates, near-duplicates, outliers, bright/dark/blurry objects.



	🔢 Optical Character Recognition: Enrich your dataset by detecting multilingual texts with PaddleOCR.



	📑 Image Captioning & Visual Question Answering (VQA): Enrich your dataset by captioning them using BLIP, BLIP-2, or ViT-GPT2 model. Alternatively, use VQA models and ask question about the content of your images with Vilt-b32 or ViT-Age model.



	🔍 Image Search: Search through large image datasets for duplicates/near-duplicates using a query image. Runs on CPU!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EXAMPLES.md

EXAMPLES.md

Learn from Examples

Load Data From Sources

Enrich Data Using Foundation Models

Extract Features From Dataset

Exciting New Features

Files

EXAMPLES.md

Latest commit

History

EXAMPLES.md

File metadata and controls

Learn from Examples

Load Data From Sources

Enrich Data Using Foundation Models

Extract Features From Dataset

Exciting New Features