L-GreCo: AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION

Data-parallel distributed training of deep neural networks (DNN) has gained very widespread adoption, but can still experience communication bottlenecks due to gradient transmission. To address this issue, entire families of lossy gradient compression mechanisms have been developed, including quantization, sparsification, and low-rank approximation, some of which are seeing significant practical adoption. Despite this progress, almost all known compression schemes apply compression uniformly across DNN layers, although layers are heterogeneous in terms of parameter count and their impact on model accuracy. In this work, we provide a general framework for adapting the degree of compression across the model’s layers dynamically during training, significantly improving the overall compression without sacrificing accuracy. Our framework, called L-GreCo, is based on an efficient adaptive algorithm, which automatically picks the optimal compression parameters for model layers guaranteeing the best compression ratio while respecting a theoretically-justified error constraint. Our extensive experimental study over image classification and language modeling tasks shows that L-GreCo is effective across all three compression families, and achieves up to 2.5× training speedup and up to 5× compression improvement over efficient implementations of standard approaches while recovering full accuracy. Moreover, we show that L-GreCo is complementary to existing adaptive algorithms improving their compression ratio by 50% and practical throughput by 66%.

For ResNet50 and Transformer-XL experiments we used https://github.com/NVIDIA/DeepLearningExamples, for Transformer-LM experiments we used https://github.com/facebookresearch/fairseq, and for ResNet18 experiment we used the ResNet18 model provided https://github.com/epfml/powersgd.

You can find all hooks in a folder named 'hooks', and also you can use 'lgreco.sh' file in each experiment in order to run the task with lgreco. ResNet18 is a natural entery point.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

L-GreCo: AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
ResNet18		ResNet18
ResNet50		ResNet50
Transformer-LM		Transformer-LM
Transformer-XL		Transformer-XL
README.md		README.md

LGrCo/L-GreCo

Folders and files

Latest commit

History

Repository files navigation

L-GreCo: AN EFFICIENT AND GENERAL FRAMEWORK FOR LAYERWISE-ADAPTIVE GRADIENT COMPRESSION

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages