Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of the standalone model #1

Open
vaishnavh opened this issue Dec 10, 2021 · 0 comments
Open

Performance of the standalone model #1

vaishnavh opened this issue Dec 10, 2021 · 0 comments

Comments

@vaishnavh
Copy link

vaishnavh commented Dec 10, 2021

Hey! I was just browsing around for some toy code for distillation and found yours. It's very useful!

If I'm not wrong, I think that the standalone small model's loss should have the from_logits set to be False because the mode's output layer already has a softmax? When I changed this, I observed that it achieves the same error as the distilled student model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant