Skip to content

Commit

Permalink
Merge pull request #147 from evancofer/hardware-limitations-and-scaling
Browse files Browse the repository at this point in the history
Hardware Limitations and Scaling
  • Loading branch information
agitter committed Dec 21, 2016
2 parents 92edc3d + 8e8fcb4 commit 1b4cb77
Show file tree
Hide file tree
Showing 2 changed files with 123 additions and 3 deletions.
48 changes: 45 additions & 3 deletions references/tags.tsv
Original file line number Diff line number Diff line change
@@ -1,6 +1,48 @@
tag citation
Zhou2015_deep_sea doi:10.1038/nmeth.3547
Chen2015_trans_species doi:10.1093/bioinformatics/btv315
Arvaniti2016_rare_subsets doi:10.1101/046508
Angermueller2016_single_methyl doi:10.1101/055715
Arvaniti2016_rare_subsets doi:10.1101/046508
Bengio2015_prec arXiv:1412.7024
Bergstra2011_hyper url:https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
Bergstra2012_random url:http://dl.acm.org/citation.cfm?id=2188395
Caruana2014_need arXiv:1312.6184
Chen2015_hashing arXiv:1504.04788
Chen2016_gene_expr doi:10.1093/bioinformatics/btw074
Chen2015_trans_species doi:10.1093/bioinformatics/btv315
Coates2013_cots_hpc url:http://www.jmlr.org/proceedings/papers/v28/coates13.html
CudNN arXiv:1410.0759
Dean2012_nips_downpour url:http://research.google.com/archive/large_deep_networks_nips2012.html
Dogwild url:https://papers.nips.cc/paper/5717-taming-the-wild-a-unified-analysis-of-hogwild-style-algorithms.pdf
Edwards2015_growing_pains doi:10.1145/2771283
Elephas url:https://github.com/maxpumperla/elephas
Gerstein2016_scaling doi:10.1186/s13059-016-0917-0
Gomezb2016_automatic arXiv:1610.02415
Graphlab doi:10.14778/2212351.2212354
Gupta2015_prec arXiv:1502.02551
Hadjas2015_cct arXiv:1504.04343
Hinton2015_dark_knowledge arXiv:1503.02531
Hinton2015_dk arXiv:1503.02531v1
Hubara2016_qnn arXiv:1609.07061
Krizhevsky2013_nips_cnn url:https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
Krizhevsky2014_weird_trick arXiv:1404.5997
Lacey2016_dl_fpga arXiv:1602.04283
Li2014_minibatch doi:10.1145/2623330.2623612
Mapreduce doi:10.1145/1327452.1327492
Meng2016_mllib arXiv:1505.06807
Moritz2015_sparknet doi:1511.06051
NIH2016_genome_cost url:https://www.genome.gov/27565109/the-cost-of-sequencing-a-human-genome/
RAD2010_view_cc doi:10.1145/1721654.1721672
Raina2009_gpu doi:10.1145/1553374.1553486
Sa2015_buckwild arXiv:1506.06438
Schatz2010_dna_cloud doi:10.1038/nbt0710-691
Schmidhuber2014_dnn_overview doi:10.1016/j.neunet.2014.09.003
Seide2014_parallel doi:10.1109/ICASSP.2014.6853593
Shaham2016_batch_effects arxiv:1610.04181
Spark doi:10.1145/2934664
Stein2010_cloud doi:10.1186/gb-2010-11-5-207
Su2015_gpu arXiv:1507.01239
Sun2016_ensemble arXiv:1606.00575
TensorFlow url:http://download.tensorflow.org/paper/whitepaper2015.pdf
Vanhoucke2011_nips_cpu url:https://research.google.com/pubs/pub37631.html
Wang2016_protein_contact doi:10.1101/073239
Yasushi2016_cgbvs_dnn doi:10.1002/minf.201600045
Zhou2015_deep_sea doi:10.1038/nmeth.3547
78 changes: 78 additions & 0 deletions sections/06_discussion.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,84 @@ with only a couple GPUs.*
*Some of this is also outlined in the Categorize section. We can decide where
it best fits.*

Efficiently scaling deep learning is challenging, and there is a high
computational cost (e.g., time, memory, energy) associated with training neural
networks and using them for classification. As such, neural networks
have only recently found widespread use [@tag:Schmidhuber2014_dnn_overview].

Many have sought to curb the costs of deep learning, with methods ranging from
the very applied (e.g., reduced numerical precision [@tag:Gupta2015_prec
@tag:Bengio015_prec @tag:Sa2015_buckwild @tag:Hubara2016_qnn]) to the exotic
and theoretic (e.g., training small networks to mimic large networks and
ensembles [@tag:Caruana2014_need @tag:Hinton2015_dark_knowledge]). The largest
gains in efficiency have come from computation with graphics processing units
(GPUs) [@tag:Raina2009_gpu @tag:Vanhoucke2011_cpu @tag:Seide2014_parallel
@tag:Hadjas2015_cc @tag:Edwards2015_growing_pains
@tag:Schmidhuber2014_dnn_overview], which excel at the matrix and vector
operations so central to deep learning. The massively parallel nature of GPUs
allows additional optimizations, such as accelerated mini-batch gradient
descent [@tag:Vanhoucke2011_cpu @tag:Seide2014_parallel @tag:Su2015_gpu
@tag:Li2014_minibatch]. However, GPUs also have a limited quantity of memory,
making it difficult to implement networks of significant size and
complexity on a single GPU or machine [@tag:Raina2009_gpu
@tag:Krizhevsky2013_nips_cnn]. This restriction has sometimes forced
computational biologists to use workarounds or limit the size of an analysis.
For example, Chen et al. [@tag:Chen2016_gene_expr] aimed to infer the
expression level of all genes with a single neural network, but due to
memory restrictions they randomly partitioned genes into two halves and
analyzed each separately. In other cases, researchers limited the size
of their neural network [@tag:Wang2016_protein_contact
@tag:Gomezb2016_automatic]. Some have also chosen to use slower
CPU implementations rather than sacrifice network size or performance
[@tag:Yasushi2016_cgbvs_dnn].

Steady improvements in GPU hardware may alleviate this issue somewhat, but it
is not clear whether they can occur quickly enough to keep up with the growing
amount of available biological data or increasing network sizes. Much has
been done to minimize the memory
requirements of neural networks [@tag:CudNN @tag:Caruana2014_need
@tag:Gupta2015_prec @tag:Bengio015_prec @tag:Sa2015_buckwild
@tag:Chen2015_hashing @tag:Hubara2016_qnn], but there is also growing
interest in specialized hardware, such as field-programmable gate arrays
(FPGAs) [@tag:Edwards2015_growing_pains @tag:Lacey2016_dl_fpga] and
application-specific integrated circuits (ASICs). Specialized hardware promises
improvements in deep learning at reduced time, energy, and memory
[@tag:Edwards2015_growing_pains]. Logically, there is less software for highly
specialized hardware [@tag:Lacey2016_dl_fpga], and it could be a difficult
investment for those not solely interested in deep learning. However, it is
likely that such options will find increased support as they become a more
popular platform for deep learning and general computation.

Distributed computing is a general solution to intense computational
requirements, and has enabled many large-scale deep learning efforts. Early
approaches to distributed computation [@tag:Mapreduce @tag:Graphlab] were
not suitable for deep learning [@tag:Dean2012_nips_downpour],
but significant progress has been made. There
now exist a number of algorithms [@tag:Dean2012_nips_downpour @tag:Dogwild
@Sa2015_buckwild], tools [@tag:Moritz2015_sparknet @tag:Meng2016_mllib
@tag:Tensorflow], and high-level libraries [@tag:Keras, @tag:Elephas] for deep
learning in a distributed environment, and it is possible to train very complex
networks with limited infrastructure [@tag:Coates2013_cots_hpc]. Besides
handling very large networks, distributed or parallelized approaches offer
other advantages, such as improved ensembling [@tag:Sun2016_ensemble] or
accelerated hyperparameter optimization [@tag:Bergstra2011_hyper
@tag:Bergstra2012_random].

Cloud computing, which has already seen adoption in genomics
[@tag:Schatz2010_dna_cloud], could facilitate easier sharing of the large
datasets common to biology [@tag:Gerstein2016_scaling @tag:Stein2010_cloud],
and may be key to scaling deep learning. Cloud computing affords researchers
significant flexibility, and enables the use of specialized hardware (e.g.,
FPGAs, ASICs, GPUs) without significant investment. With such flexibility, it
could be easier to address the different challenges associated with the
multitudinous layers and architectures available
[@tag:Krizhevsky2014_weird_trick]. Though many are reluctant to store sensitive
data (e.g., patient electronic health records) in the cloud,
secure/regulation-compliant cloud services do exist [@tag:RAD2010_view_cc].

*TODO: Write the transition once more of the Discussion section has been
fleshed out.*

### Code, data, and model sharing

*Reproducibiliy is important for science to progress. In the context of deep
Expand Down

0 comments on commit 1b4cb77

Please sign in to comment.