This is the official code to reproduce our results as published in BMVC2023
conda create -n temi python=3.7
conda activate temi
pip install -r requirements.txt
dino_resnet50, dino_vits16, dino_vitb16, timm_resnet50,timm_vit_small_patch16_224, timm_vit_base_patch16_224 timm_vit_large_patch16_224, convnext_small, convnext_base, convnext_large, msn_vit_small, msn_vit_base msn_vit_large, mocov3_vit_small, mocov3_vit_base, clip_ViT-B/16, clip_ViT-L/14, clip_RN50, mae_vit_base, mae_vit_large, mae_vit_huge
Available dataset names (IN1K and its subsets need the imagenet (IN1K) path to be passed with --datapath
where ./data
is used by default):
CIFAR10, CIFAR100, STL10, CIFAR20, IN50, IN100, IN200, IN1K
python gen_embeds.py --arch clip_ViT-B/32 --dataset CIFAR10 --batch_size 256
export CUDA_VISIBLE_DEVICES=0; outdir=$"./experiments/TEMI-output-test" ; clusters=10 ; dataset=$"CIFAR10";
python train_main.py --precomputed --arch clip_ViT-B/32 --batch_size=1024 --use_fp16=false --max_momentum_teacher=0.996 \
--lr=1e-4 --warmup_epochs=20 --min_lr=1e-4 --epochs=100 --output_dir $outdir --dataset $dataset --knn=50
--out_dim=$clusters --num_heads=16 --loss TEMI --loss-args beta=0.6 \
python eval_experiment.py --ckpt_folder $outdir
Don't forget to generate the image embeddings first and fix the imagenet paths (--datapth
).
dataset=$"IN1K"; clusters=$25000 ; model=dino_vitb16; head=$16; knn=$25;
echo "clusters:" $clusters "dataset:" $dataset "heads" $head "knn-pairs" $knn "model" $model
outdir=$"./experiments/overclustering/$indist-$model/"
python train_main.py --disable_ddp --precomputed --embed_norm --arch $model \
--batch_size=128 --use_fp16=false --max_momentum_teacher=0.996 \
--lr=1e-4 --warmup_epochs=20 --min_lr=1e-4 --epochs=100 \
--output_dir $outdir --dataset $dataset \
--knn=$knn --out_dim=$clusters --num_heads=$head \
--loss TEMI --loss-args beta=$beta \
python eval_experiment.py --ckpt_folder $outdir
@inproceedings{Adaloglou_2023_BMVC,
author = {Nikolas Adaloglou and Felix Michels and Hamza Kalisch and Markus Kollmann},
title = {Exploring the Limits of Deep Image Clustering using Pretrained Models},
booktitle = {34th British Machine Vision Conference 2023, {BMVC} 2023, Aberdeen, UK, November 20-24, 2023},
publisher = {BMVA},
year = {2023},
url = {https://papers.bmvc2023.org/0297.pdf}
}
The codebase was developed based on FAIR's DINO repository, which has an Apache License 2.0. For the clustering evaluations, we used the function from SSCN
python linear_evaluation.py --arch=clip_ViT-B/32 --dataset CIFAR10
Note: Multiple architectures can be passed in --archs
python baseline_kmeans.py --dataset CIFAR10 --archs clip_ViT-B/32