Google Scholar

Moonshine: Distilling with cheap convolutions

EJ Crowley, G Gray, AJ Storkey - Advances in Neural …, 2018 - proceedings.neurips.cc

Advances in Neural Information Processing Systems, 2018•proceedings.neurips.cc

Abstract

Many engineers wish to deploy modern neural networks in memory-limited settings; but the development of flexible methods for reducing memory use is in its infancy, and there is little knowledge of the resulting cost-benefit. We propose structural model distillation for memory reduction using a strategy that produces a student architecture that is a simple transformation of the teacher architecture: no redesign is needed, and the same hyperparameters can be used. Using attention transfer, we provide Pareto curves/tables for distillation of residual networks with four benchmark datasets, indicating the memory versus accuracy payoff. We show that substantial memory savings are possible with very little loss of accuracy, and confirm that distillation provides student network performance that is better than training that student architecture directly on data.

proceedings.neurips.cc

さらに表示一部を表示

保存引用被引用数: 144 関連記事全 10 バージョン HTMLバージョン

引用

検索オプション

マイライブラリに保存しました

Moonshine: Distilling with cheap convolutions