Skip to content

[AAAI 2024] Understanding the Role of the Projector in Knowledge Distillation

Notifications You must be signed in to change notification settings

roymiles/Simple-Recipe-Distillation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Understanding the Role of the Projector in Knowledge Distillation

This is the official implementation of AAAI24 paper "Understanding the Role of the Projector in Knowledge Distillation" Code for the AAAI24 paper:

"Understanding the Role of the Projector in Knowledge Distillation".
Roy Miles, Krystian Mikolajczyk. AAAI 2024.

[Paper on arxiv]

Structure

The two main sets of experiments corresponding to the DeIT and ResNet results can be found in folders deit/ and resnet/ respectively. The DeIT code is based on that provided by Co-Advise, while the ResNet code uses the torchdistill library.

Pretrained Models

We provide the pre-distilled model weights and logs for the DeIT experiments.

model acc@1 acc@5 #params url
tiny 77.2 93.7 5M model
small 82.1 96.0 22M model

Testing and Training

Before training, make sure to change the deit/config.py entries for your data path, output directory etc. The RegNet160 teacher weights are expected to be found in ckpts/regnety_160-a5fe301d.pth, else the most recent pre-trained Hugging Face weights will be downloaded. Note that the results reporting in our paper using the same teacher weights used by DeIT. We have tested training with 1 and 2 GPUs using effective batches sizes between 256 and 1024. Using larger batch sizes, or more GPUs, may require modifying the distributed training slightly and/or the learning rates.

Training is then simply run as follows:

python main.py --model tiny --train_student

Omitting the --train_student argument will evaluate the model using the checkpoint weights in ckpts/ckpt_epoch_299.pth. See deit/main.py for more details.

python main.py --model tiny

ImageNet

For training a ResNet18 student using a ResNet34 teacher, we use the torchdistill library.

cd imagenet
python image_classification.py --config configs/ilsvrc2012/ours/res3418.yaml --log log/output.txt

Pretrained Models

We provide the pre-distilled model weights and logs. This reproduced experiment has an accuracy higher than that reported in the original paper.

model acc@1 url
resnet18 71.87 model

Citation

@InProceedings{miles2023understanding_AAAI,
      title      = {Understanding the Role of the Projector in Knowledge Distillation}, 
      author     = {Roy Miles and Krystian Mikolajczyk},
      booktitle  = {Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI-24)},
      year       = {2023},
      month      = {December}
}

If you have any questions, feel free to email me!

About

[AAAI 2024] Understanding the Role of the Projector in Knowledge Distillation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published