Speeding up DNN training in distributed environments

Summary

Introduction
Running
References

Introduction

This repository contains a minimal framework to quickly prototype deep architectures and facilitate weight and gradients sharing among processing nodes.

We introduce and implement parallel DNN optimization algorithms and conduct a (hopefully complete) benchmark of the different methods, evaluated on the MNIST and Fashion-MNIST datasets. For a full description of the project, you can check out the project report!

Running

This project uses the following dependencies:

Eigen3, for basic matrix operations (also included in the repository)
OpenMPI 2.1.1

The list of experiments available is as follows:

param_avg: Weight averaging algorithm described in the report.
parallel_sgd: Gradient averaging algorithm described in the report.
w_param_avg: Weighted parameter averaging algorithm described in the report.

To compile and run the experiments, from the root directory:

make experiment_name

And then, for an MPI experiment:

mpiexec -n n_cores runmpi -options

Parameters are as follows. For all experiments, the following arguments are available:

-batch_size: Size of each batch
-eval_acc : To be set to 1 if validation accuracies must be evaluated, 0 otherwise (in which case epoch durations are logged instead).
-n_epochs : Total number of epochs

Specifically to the following methods, additional parameters are available

param_avg, w_param_avg

-avg_freq: Weight averaging frequency (in epochs).

w_param_avg

-lambda: Value of the lambda parameter (integer, divided by 100).

References

[Ben-Nun et al., 2018] Demystifying parallel and distributed deep learning: An in-depth concurrency analysis
[Ericson et al., 2017] On the performance of network parallel training in artificial neural networks
[Han Xiao et al., 2017] Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
Eigen		Eigen
data		data
exps		exps
layers		layers
logs		logs
losses		losses
make		make
models		models
Makefile		Makefile
README.md		README.md
main_fig.svg		main_fig.svg
report.pdf		report.pdf
types.h		types.h
utils.h		utils.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speeding up DNN training in distributed environments

Summary

Introduction

Running

References

About

Releases

Packages

Languages

HabibSlim/DistDNNs

Folders and files

Latest commit

History

Repository files navigation

Speeding up DNN training in distributed environments

Summary

Introduction

Running

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages