Moonshine: Distilling with Cheap Convolutions

Crowley, Elliot J.; Gray, Gavin; Storkey, Amos

Statistics > Machine Learning

arXiv:1711.02613v4 (stat)

[Submitted on 7 Nov 2017 (v1), last revised 17 Jan 2019 (this version, v4)]

Title:Moonshine: Distilling with Cheap Convolutions

Authors:Elliot J. Crowley, Gavin Gray, Amos Storkey

View PDF

Abstract:Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.

Comments:	32nd Conference on Neural Information Processing Systems (NeurIPS 2018)
Subjects:	Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:1711.02613 [stat.ML]
	(or arXiv:1711.02613v4 [stat.ML] for this version)
	https://doi.org/10.48550/arXiv.1711.02613

Submission history

From: Elliot J. Crowley [view email]
[v1] Tue, 7 Nov 2017 17:21:06 UTC (316 KB)
[v2] Mon, 21 May 2018 11:43:02 UTC (318 KB)
[v3] Mon, 22 Oct 2018 16:47:40 UTC (125 KB)
[v4] Thu, 17 Jan 2019 12:26:19 UTC (124 KB)

Statistics > Machine Learning

Title:Moonshine: Distilling with Cheap Convolutions

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Statistics > Machine Learning

Title:Moonshine: Distilling with Cheap Convolutions

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators