Performance of the standalone model #1

vaishnavh · 2021-12-10T20:43:56Z

Hey! I was just browsing around for some toy code for distillation and found yours. It's very useful!

If I'm not wrong, I think that the standalone small model's loss should have the from_logits set to be False because the mode's output layer already has a softmax? When I changed this, I observed that it achieves the same error as the distilled student model.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance of the standalone model #1

Performance of the standalone model #1

vaishnavh commented Dec 10, 2021 •

edited

Loading

Performance of the standalone model #1

Performance of the standalone model #1

Comments

vaishnavh commented Dec 10, 2021 • edited Loading

vaishnavh commented Dec 10, 2021 •

edited

Loading