Thesis order decided

kumar-shridhar · Nov 19, 2018 · eb57d2d · eb57d2d
1 parent 33ef849
commit eb57d2d
Show file tree

Hide file tree

Showing 5 changed files with 22 additions and 61 deletions.
diff --git a/Chapter1/chapter1.tex b/Chapter1/chapter1.tex
@@ -26,7 +26,7 @@ \section{Problem Statement}
 
 We will address both of these concerns by using Bayesian learning to add a measure for uncertainty and regularization in their predictions. 
 
-\section{How to tackle the problem}
+\section{Our Hypothesis}
 
 Deep Neural Networks have been successfully applied to many domains, including sensitive domains like health-care, security, fraudulent transactions and many more. However, from a probability theory perspective, it is unjustifiable to use single point-estimates as weights to base any classification on.
 On the other hand, Bayesian neural networks (NNs) are more robust to over-fittings, and can easily learn from small datasets. Bayesian approach further offers uncertainty estimates via its parameters in form of probability distributions (see Figure \ref{fig:Scalar_Bayesian_Distribution}). At the same time, by using a prior probability distribution to integrate out the parameters, we compute the average across many models during training, which gives a regularization effect to the network, thus preventing overfitting.
@@ -58,44 +58,4 @@ \section{How to tackle the problem}
 This work builds on the foundations laid out by Blundell et al. \cite{blundell2015weight}, who introduced \textit{Bayes by Backprop} for feedforward neural networks. Together with the extension to recurrent neural networks, introduced by Fortunato et al. \cite{fortunato2017bayesian}, \textit{Bayes by Backprop} is now applicable on the three most frequently used types of neural networks, i.e., feedforward, recurrent, and convolutional neural networks.
 
 
-\nomenclature[z-cif]{$CIF$}{Cauchy's Integral Formula} % first letter Z is for Acronyms 
-\nomenclature[a-F]{$F$}{complex function} % first letter A is for Roman symbols
-\nomenclature[g-p]{$\pi$}{ $\simeq 3.14\ldots$} % first letter G is for Greek Symbols
-\nomenclature[g-i]{$\iota$}{unit imaginary number $\sqrt{-1}$} % first letter G is for Greek Symbols
-\nomenclature[g-g]{$\gamma$}{a simply closed curve on a complex plane} % first letter G is for Greek Symbols
-\nomenclature[x-i]{$\oint_\gamma$}{integration around a curve $\gamma$} % first letter X is for Other Symbols
-\nomenclature[r-j]{$j$}{superscript index} % first letter R is for superscripts
-\nomenclature[s-0]{$0$}{subscript index} % first letter S is for subscripts
-
-
-%********************************** %Second Section *************************************
-
-\nomenclature[z-DEM]{DEM}{Discrete Element Method}
-\nomenclature[z-FEM]{FEM}{Finite Element Method}
-\nomenclature[z-PFEM]{PFEM}{Particle Finite Element Method}
-\nomenclature[z-FVM]{FVM}{Finite Volume Method}
-\nomenclature[z-BEM]{BEM}{Boundary Element Method}
-\nomenclature[z-MPM]{MPM}{Material Point Method}
-\nomenclature[z-LBM]{LBM}{Lattice Boltzmann Method}
-\nomenclature[z-MRT]{MRT}{Multi-Relaxation 
-Time}
-\nomenclature[z-RVE]{RVE}{Representative Elemental Volume}
-\nomenclature[z-GPU]{GPU}{Graphics Processing Unit}
-\nomenclature[z-SH]{SH}{Savage Hutter}
-\nomenclature[z-CFD]{CFD}{Computational Fluid Dynamics}
-\nomenclature[z-LES]{LES}{Large Eddy Simulation}
-\nomenclature[z-FLOP]{FLOP}{Floating Point Operations}
-\nomenclature[z-ALU]{ALU}{Arithmetic Logic Unit}
-\nomenclature[z-FPU]{FPU}{Floating Point Unit}
-\nomenclature[z-SM]{SM}{Streaming Multiprocessors}
-\nomenclature[z-PCI]{PCI}{Peripheral Component Interconnect}
-\nomenclature[z-CK]{CK}{Carman - Kozeny}
-\nomenclature[z-CD]{CD}{Contact Dynamics}
-\nomenclature[z-DNS]{DNS}{Direct Numerical Simulation}
-\nomenclature[z-EFG]{EFG}{Element-Free Galerkin}
-\nomenclature[z-PIC]{PIC}{Particle-in-cell}
-\nomenclature[z-USF]{USF}{Update Stress First}
-\nomenclature[z-USL]{USL}{Update Stress Last}
-\nomenclature[s-crit]{crit}{Critical state}
-\nomenclature[z-DKT]{DKT}{Draft Kiss Tumble}
-\nomenclature[z-PPC]{PPC}{Particles per cell}
+%\nomenclature[z-cif]{$CIF$}{Cauchy's Integral Formula} % first letter Z is for Acronyms 
diff --git a/Chapter2/chapter2.tex b/Chapter2/chapter2.tex
@@ -42,7 +42,7 @@ \subsection{Brain Analogies}
 \end{center}
 \end{figure}
 
-\subsection{Artificial Neural Network}
+\subsection{Multi Layer Perceptron}
 
 An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological nervous systems, such as the brain, process information. The key element of this paradigm is the novel structure of the information processing system. It is composed of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems. ANNs, like people, learn by example. An ANN is configured for a specific application, such as pattern recognition or data classification, through a learning process. Learning in biological systems involves adjustments to the synaptic connections that exist between the neurons. This is true of ANNs as well.
 
@@ -162,7 +162,6 @@ \subsection{Dropout as Variational Inference}
 \end{align}
 with $\widehat{\bo}_t \sim q \big( \bo \big)$. This is referred to as MC dropout.
 
-\subsection{Monte Carlo variational inference}
 \subsection{Local reparametrisation trick}
 
 We utilise the local reparameterization trick \cite{kingma2015variational} and apply it to \acp{cnn}. Following \cite{kingma2015variational,neklyudov2018variance}, we do not sample the weights $w$, but we sample instead layer activations $b$ due to its consequent computational acceleration. The variational posterior probability distribution $q_{\theta}(w_{ijhw}|\mathcal{D})=\mathcal{N}(\mu_{ijhw},\alpha_{ijhw}\mu^2_{ijhw})$ (where $i$ and $j$ are the input, respectively output layers, $h$ and $w$ the height, respectively width of any given filter) allows to implement the local reparamerization trick in convolutional layers. This results in the subsequent equation for convolutional layer activations $b$:
@@ -171,7 +170,8 @@ \subsection{Local reparametrisation trick}
 \end{equation}
 where $\epsilon_j \sim \mathcal{N}(0,1)$, $A_i$ is the receptive field, $\ast$ signalises the convolutional operation, and $\odot$ the component-wise multiplication.
 
-\section{Bayesian Uncertainities}
+\section{Weight Uncertainities}
+\subsection{Bayesian Uncertainities}
 
 There are two major types of uncertainty one can model. \textit{Aleatoric} uncertainty captures noise inherent in the observations. On the other hand, \textit{epistemic} uncertainty accounts for uncertainty in the model -- uncertainty which can be explained away given enough data. Traditionally it has been difficult to model epistemic uncertainty in computer vision, but with new Bayesian deep learning tools this is now possible.
 We study the benefits of modeling epistemic vs.\ aleatoric uncertainty in Bayesian deep learning models for vision tasks. For this we present a Bayesian deep learning framework combining input-dependent aleatoric uncertainty together with epistemic uncertainty. We study models under the framework with per-pixel semantic segmentation and depth regression tasks. Further, our explicit uncertainty formulation leads to new loss functions for these tasks, which can be interpreted as learned attenuation. This makes the loss more robust to noisy data, also giving new state-of-the-art results on segmentation and depth regression benchmarks. 
@@ -186,7 +186,8 @@ \section{Bayesian Uncertainities}
 
 Existing approaches to Bayesian deep learning capture either epistemic uncertainty alone, or aleatoric uncertainty alone \cite{gal2016thesis}. These uncertainties are formalised as probability distributions over either the model parameters, or model outputs, respectively. Epistemic uncertainty is modeled by placing a prior distribution over a model's weights, and then trying to capture how much these weights vary given some data. Aleatoric uncertainty on the other hand is modeled by placing a distribution over the output of the model. For example, in regression our outputs might be modeled as corrupted with Gaussian random noise. In this case we are interested in learning the noise's variance as a function of different inputs (such noise can also be modeled with a constant value for all data points, but this is of less practical interest). These uncertainties, in the context of Bayesian deep learning, are explained in more detail in this section. 
 
-\section{Bayes by Backprop}
+\section{Backpropagation}
+\subsection{Bayes by Backprop}
 \textit{Bayes by Backprop} \cite{graves2011practical, blundell2015weight} is a variational inference method to learn the posterior distribution on the weights $w \sim q_{\theta}(w|\mathcal{D})$ of a neural network from which weights $w$ can be sampled in backpropagation. 
 It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood.
 
@@ -213,3 +214,5 @@ \section{Bayes by Backprop}
 where $n$ is the number of draws.
 \newline We sample $w^{(i)}$ from $q_{\theta}(w|\mathcal{D})$. The uncertainty afforded by \textit{Bayes by Backprop} trained neural networks has been used successfully for training feedforward neural networks in both supervised and reinforcement learning environments \cite{blundell2015weight,lipton2016efficient,houthooft2016curiosity}, for training recurrent neural networks \cite{fortunato2017bayesian}, but has not been applied to convolutional neural networks to-date.
 
+\section{Model weights pruning}
+
diff --git a/Chapter5/chapter5.tex b/Chapter5/chapter5.tex
@@ -16,9 +16,7 @@ \chapter{Empirical Analysis}
 
 \pagebreak
 
-\section{Experimentation Methodology}
-
-\section{Experiments} \label{experiments}
+\section{Experimentation Methodology} \label{experiments}
 For all conducted experiments, we implement the foregoing description of Bayesian \acp{cnn} with variational inference in LeNet-5 \cite{lecun1998gradient} and AlexNet \cite{krizhevsky2012imagenet}. The exact architecture specifications can be found in the Appendix and in our GitHub repository\footnote{\url{https://github.com/kumar-shridhar/PyTorch-BayesianCNN}}.
 We train the networks with the MNIST dataset of handwritten digits \cite{lecun1998gradient}, and with the CIFAR-10 and CIFAR-100 datasets \cite{krizhevsky2009learning} since these datasets serve widely as benchmarks for \acp{cnn}' performances. The originally chosen activation functions in all architectures are \textit{ReLU}, but we must introduce another, called \textit{Softplus}, see \eqref{softplus}, because of our method to apply two convolutional or fully-connected operations. As aforementioned, one of these is determining the mean $\mu$, and the other the variance $\alpha \mu^2$. Specifically, we apply the \textit{Softplus} function because we want to ensure that the variance $\alpha \mu^2$ never becomes zero. This would be equivalent to merely calculating the MAP, which can be interpreted as equivalent to a maximum likelihood estimation (MLE), which is further equivalent to utilising single point-estimates, hence frequentist inference. The \textit{Softplus} activation function is a smooth approximation of \textit{ReLU}. Although it is practically not influential, it has the subtle and analytically important advantage that it never becomes zero for $x \rightarrow -\infty$, whereas \textit{ReLU} becomes zero for $x \rightarrow -\infty$.
 \\ 
@@ -29,6 +27,7 @@ \section{Experiments} \label{experiments}
 where $\beta$ is by default set to $1$.
 \newline All experiments are performed with the same hyper-parameters settings as stated in the Appendix.
 
+\section{Case Study 1: Small Datasets (MNIST, CIFAR-10, STL-10)}
 \subsection{Datasets}
 As aforementioned, we train various architectures on multiple datasets, namely MNIST, CIFAR-10, and CIFAR-100. 
 \newline
@@ -114,8 +113,12 @@ \subsection{Results}
  \label{tab:uncertainty}
 \end{table}
 %
-\section{Case Study 1: Small Datasets (MNIST, CIFAR-10, STL-10)}
 \section{Case Study 2: Large Dataset (CIFAR-100)}
+\subsection{Dataset}
+\subsection{Results}
+
+\section{Uncertainity estimation}
+\section{Model Pruning}
 
 \ifpdf
  \graphicspath{{Chapter2/Figs/Raster/}{Chapter2/Figs/PDF/}{Chapter2/Figs/}}

diff --git a/Chapter6/chapter6.tex b/Chapter6/chapter6.tex
@@ -17,10 +17,6 @@ \chapter{Applications}
 \pagebreak
 
 
-\section{BayesCNN for Image Classification}
-And now I begin my third chapter here \dots
-
-And now to cite some more people~\citet{Rea85,Ancey1996}
 
 \section{BayesCNN for Image Super Resolution}
 
@@ -54,17 +50,16 @@ \subsection{Our Approach}
 \end{center}
 \end{figure*}
 
-\subsubsection{Empirical Analysis}
+\subsection{Empirical Analysis}
 
 
 
 \section{BayesCNN for Generative Adversarial Networks}
 
-\section{Introduction}
 Generative Adversarial Networks (GANs) \cite{goodfellow2014generative} can be used for two major tasks: to learn good feature representations by using the generator and discriminator networks as feature extractors, and to generate natural images. The learned feature representation or generated images can reduce the number of images substantially for a computer vision supervised task. However GANs were quite unstable to train in the past and that is why we base our work on the stable GAN architecture namely Deep Convolutional GANs (DCGAN) \cite{DBLP:journals/corr/RadfordMC15}. We use the trained Bayesian discriminators for image classification tasks, showing competitive performance with the normal DCGAN architecture.
 
-\section{Our approach}
-\section{Empirical Analysis}
+\subsection{Our approach}
+\subsection{Empirical Analysis}
 
 
 
diff --git a/thesis.tex b/thesis.tex
@@ -113,9 +113,9 @@
 
 \maketitle
 
-\include{Dedication/dedication}
-\include{Declaration/declaration}
-\include{Acknowledgement/acknowledgement}
+%\include{Dedication/dedication}
+%\include{Declaration/declaration}
+%\include{Acknowledgement/acknowledgement}
 \include{Abstract/abstract}
 
 % *********************** Adding TOC and List of Figures ***********************