Conflict With "On the Efficacy of Knowledge Distillation" Results #150
AhmedHussKhalifa
started this conversation in
General
Replies: 1 comment 6 replies
-
Hi @AhmedHussKhalifa , Thank you for your interest in and question about torchdistill work! From the description, I think the choice of temperature and alpha matters in their settings and produced the different trend. Another possible factor is the number of GPUs (i.e., effective batch size and linear scaling rule of learning rate if distributed training). |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey,
I want to thank you for this great work.
I went through your trained model of ImageNet by KD. The resent-18 trained by resnet-34 has a performance of 71.34%, which is really amazing. I found in "On the Efficacy of Knowledge Distillation" paper the same experiment with accuracy of 69.21% but with different hyperparameters as down mentioned. To the best of my knowledge, you have used different alfa (0.5) and temperature (1).
Do you think, is this the only reason for this huge difference?
Beta Was this translation helpful? Give feedback.
All reactions