Merge pull request #147 from evancofer/hardware-limitations-and-scaling

Hardware Limitations and Scaling
greenelab · Dec 21, 2016 · 1b4cb77 · 1b4cb77
2 parents 92edc3d + 8e8fcb4
commit 1b4cb77
Show file tree

Hide file tree

Showing 2 changed files with 123 additions and 3 deletions.
diff --git a/references/tags.tsv b/references/tags.tsv
@@ -1,6 +1,48 @@
 tag citation
-Zhou2015_deep_sea doi:10.1038/nmeth.3547
-Chen2015_trans_species doi:10.1093/bioinformatics/btv315
-Arvaniti2016_rare_subsets doi:10.1101/046508
 Angermueller2016_single_methyl doi:10.1101/055715
+Arvaniti2016_rare_subsets doi:10.1101/046508
+Bengio2015_prec arXiv:1412.7024
+Bergstra2011_hyper url:https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf
+Bergstra2012_random url:http://dl.acm.org/citation.cfm?id=2188395
+Caruana2014_need arXiv:1312.6184
+Chen2015_hashing arXiv:1504.04788
+Chen2016_gene_expr doi:10.1093/bioinformatics/btw074
+Chen2015_trans_species doi:10.1093/bioinformatics/btv315
+Coates2013_cots_hpc url:http://www.jmlr.org/proceedings/papers/v28/coates13.html
+CudNN arXiv:1410.0759
+Dean2012_nips_downpour url:http://research.google.com/archive/large_deep_networks_nips2012.html
+Dogwild url:https://papers.nips.cc/paper/5717-taming-the-wild-a-unified-analysis-of-hogwild-style-algorithms.pdf
+Edwards2015_growing_pains doi:10.1145/2771283
+Elephas url:https://github.com/maxpumperla/elephas
+Gerstein2016_scaling doi:10.1186/s13059-016-0917-0
+Gomezb2016_automatic arXiv:1610.02415
+Graphlab doi:10.14778/2212351.2212354
+Gupta2015_prec arXiv:1502.02551
+Hadjas2015_cct arXiv:1504.04343
+Hinton2015_dark_knowledge arXiv:1503.02531
+Hinton2015_dk arXiv:1503.02531v1
+Hubara2016_qnn arXiv:1609.07061
+Krizhevsky2013_nips_cnn url:https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
+Krizhevsky2014_weird_trick arXiv:1404.5997
+Lacey2016_dl_fpga arXiv:1602.04283
+Li2014_minibatch doi:10.1145/2623330.2623612
+Mapreduce doi:10.1145/1327452.1327492
+Meng2016_mllib arXiv:1505.06807
+Moritz2015_sparknet doi:1511.06051
+NIH2016_genome_cost url:https://www.genome.gov/27565109/the-cost-of-sequencing-a-human-genome/
+RAD2010_view_cc doi:10.1145/1721654.1721672
+Raina2009_gpu doi:10.1145/1553374.1553486
+Sa2015_buckwild arXiv:1506.06438
+Schatz2010_dna_cloud doi:10.1038/nbt0710-691
+Schmidhuber2014_dnn_overview doi:10.1016/j.neunet.2014.09.003
+Seide2014_parallel doi:10.1109/ICASSP.2014.6853593
 Shaham2016_batch_effects arxiv:1610.04181
+Spark doi:10.1145/2934664
+Stein2010_cloud doi:10.1186/gb-2010-11-5-207
+Su2015_gpu arXiv:1507.01239
+Sun2016_ensemble arXiv:1606.00575
+TensorFlow url:http://download.tensorflow.org/paper/whitepaper2015.pdf
+Vanhoucke2011_nips_cpu url:https://research.google.com/pubs/pub37631.html
+Wang2016_protein_contact doi:10.1101/073239
+Yasushi2016_cgbvs_dnn doi:10.1002/minf.201600045
+Zhou2015_deep_sea doi:10.1038/nmeth.3547
diff --git a/sections/06_discussion.md b/sections/06_discussion.md
@@ -49,6 +49,84 @@ with only a couple GPUs.*
 *Some of this is also outlined in the Categorize section. We can decide where
 it best fits.*
 
+Efficiently scaling deep learning is challenging, and there is a high
+computational cost (e.g., time, memory, energy) associated with training neural
+networks and using them for classification. As such, neural networks
+have only recently found widespread use [@tag:Schmidhuber2014_dnn_overview].
+
+Many have sought to curb the costs of deep learning, with methods ranging from
+the very applied (e.g., reduced numerical precision [@tag:Gupta2015_prec
+@tag:Bengio015_prec @tag:Sa2015_buckwild @tag:Hubara2016_qnn]) to the exotic
+and theoretic (e.g., training small networks to mimic large networks and
+ensembles [@tag:Caruana2014_need @tag:Hinton2015_dark_knowledge]). The largest
+gains in efficiency have come from computation with graphics processing units
+(GPUs) [@tag:Raina2009_gpu @tag:Vanhoucke2011_cpu @tag:Seide2014_parallel
+@tag:Hadjas2015_cc @tag:Edwards2015_growing_pains
+@tag:Schmidhuber2014_dnn_overview], which excel at the matrix and vector
+operations so central to deep learning. The massively parallel nature of GPUs
+allows additional optimizations, such as accelerated mini-batch gradient
+descent [@tag:Vanhoucke2011_cpu @tag:Seide2014_parallel @tag:Su2015_gpu
+@tag:Li2014_minibatch]. However, GPUs also have a limited quantity of memory,
+making it difficult to implement networks of significant size and
+complexity on a single GPU or machine [@tag:Raina2009_gpu
+@tag:Krizhevsky2013_nips_cnn]. This restriction has sometimes forced
+computational biologists to use workarounds or limit the size of an analysis.
+For example, Chen et al. [@tag:Chen2016_gene_expr] aimed to infer the
+expression level of all genes with a single neural network, but due to
+memory restrictions they randomly partitioned genes into two halves and
+analyzed each separately. In other cases, researchers limited the size
+of their neural network [@tag:Wang2016_protein_contact
+@tag:Gomezb2016_automatic]. Some have also chosen to use slower
+CPU implementations rather than sacrifice network size or performance
+[@tag:Yasushi2016_cgbvs_dnn].
+
+Steady improvements in GPU hardware may alleviate this issue somewhat, but it
+is not clear whether they can occur quickly enough to keep up with the growing
+amount of available biological data or increasing network sizes. Much has
+been done to minimize the memory
+requirements of neural networks [@tag:CudNN @tag:Caruana2014_need
+@tag:Gupta2015_prec @tag:Bengio015_prec @tag:Sa2015_buckwild
+@tag:Chen2015_hashing @tag:Hubara2016_qnn], but there is also growing
+interest in specialized hardware, such as field-programmable gate arrays
+(FPGAs) [@tag:Edwards2015_growing_pains @tag:Lacey2016_dl_fpga] and
+application-specific integrated circuits (ASICs). Specialized hardware promises
+improvements in deep learning at reduced time, energy, and memory
+[@tag:Edwards2015_growing_pains]. Logically, there is less software for highly
+specialized hardware [@tag:Lacey2016_dl_fpga], and it could be a difficult
+investment for those not solely interested in deep learning. However, it is
+likely that such options will find increased support as they become a more
+popular platform for deep learning and general computation.
+
+Distributed computing is a general solution to intense computational
+requirements, and has enabled many large-scale deep learning efforts. Early
+approaches to distributed computation [@tag:Mapreduce @tag:Graphlab] were
+not suitable for deep learning [@tag:Dean2012_nips_downpour],
+but significant progress has been made. There
+now exist a number of algorithms [@tag:Dean2012_nips_downpour @tag:Dogwild
+@Sa2015_buckwild], tools [@tag:Moritz2015_sparknet @tag:Meng2016_mllib
+@tag:Tensorflow], and high-level libraries [@tag:Keras, @tag:Elephas] for deep
+learning in a distributed environment, and it is possible to train very complex
+networks with limited infrastructure [@tag:Coates2013_cots_hpc]. Besides
+handling very large networks, distributed or parallelized approaches offer
+other advantages, such as improved ensembling [@tag:Sun2016_ensemble] or
+accelerated hyperparameter optimization [@tag:Bergstra2011_hyper
+@tag:Bergstra2012_random].
+
+Cloud computing, which has already seen adoption in genomics
+[@tag:Schatz2010_dna_cloud], could facilitate easier sharing of the large
+datasets common to biology [@tag:Gerstein2016_scaling @tag:Stein2010_cloud],
+and may be key to scaling deep learning. Cloud computing affords researchers
+significant flexibility, and enables the use of specialized hardware (e.g.,
+FPGAs, ASICs, GPUs) without significant investment. With such flexibility, it
+could be easier to address the different challenges associated with the
+multitudinous layers and architectures available
+[@tag:Krizhevsky2014_weird_trick]. Though many are reluctant to store sensitive
+data (e.g., patient electronic health records) in the cloud,
+secure/regulation-compliant cloud services do exist [@tag:RAD2010_view_cc].
+
+*TODO: Write the transition once more of the Discussion section has been
+fleshed out.*
+
 ### Code, data, and model sharing
 
 *Reproducibiliy is important for science to progress. In the context of deep