Introduction done final time

kumar-shridhar · Dec 13, 2018 · 41bb28c · 41bb28c
1 parent a9184fb
commit 41bb28c
Show file tree

Hide file tree

Showing 7 changed files with 25 additions and 26 deletions.
diff --git a/Abstract/abstract.tex b/Abstract/abstract.tex
@@ -1,18 +1,18 @@
 % ************************** Thesis Abstract *****************************
 % Use `abstract' as an option in the document class to print only the titlepage and the abstract.
 \begin{abstract}
-Artificial Neural Networks are connectionist systems that learn to perform tasks by learning on examples without having a prior knowledge about the tasks. This is done by finding an optimal point estimate for the weights in every node.
+Artificial Neural Networks are connectionist systems that perform a given task by learning on examples without having a prior knowledge about the task. This is done by finding an optimal point estimate for the weights in every node.
 Generally, the network using point estimates as weights perform well with large datasets, but they fail to express uncertainty in regions with little or no data, leading to overconfident decisions.
 \newline
 
 In this thesis, Bayesian Convolutional Neural Network (BayesCNN) using Variational Inference is proposed, that introduces probability distribution over the weights. Furthermore, the proposed BayesCNN architecture is applied to tasks like Image Classification, Image Super-Resolution and Generative Adversarial Networks.
 
 BayesCNN is based on Bayes by Backprop which derives a variational approximation to the true posterior. 
-Our proposed method not only achieves performances equivalent to frequentist inference in identical architectures but also incorporate a measurement for uncertainties and regularisation. It further eliminates the use of dropout in the model. Furthermore, we predict how certain the model prediction is based on the epistemic and aleatoric uncertainties and finally propose ways to prune the model and make it computational and time effective. 
+Our proposed method not only achieves performances equivalent to frequentist inference in identical architectures but also incorporate a measurement for uncertainties and regularisation. It further eliminates the use of dropout in the model. Moreover, we predict how certain the model prediction is based on the epistemic and aleatoric uncertainties and finally, we propose ways to prune the Bayesian architecture and to make it more computational and time effective. 
 \newline
 
 
-In the first part of the thesis, the Bayesian Neural Network is explained and it is applied to an Image Classification task. The results are compared to point-estimates based architectures on MNIST, CIFAR-10, CIFAR-100 and STL-10 datasets. Moreover, uncertainties are calculated and pruning of the architecture is done.
+In the first part of the thesis, the Bayesian Neural Network is explained and it is applied to an Image Classification task. The results are compared to point-estimates based architectures on MNIST, CIFAR-10, CIFAR-100 and STL-10 datasets. Moreover, uncertainties are calculated and the architecture is pruned and a comparison between the results is drawn.
 
 In the second part of the thesis, the concept is further applied to other computer vision tasks namely, Image Super-Resolution and Generative Adversarial Networks. The concept of BayesCNN is tested and compared against other concepts in a similar domain. 
 

diff --git a/Acknowledgement/acknowledgement.tex b/Acknowledgement/acknowledgement.tex
@@ -2,18 +2,18 @@
 
 \begin{acknowledgements} 
 
-I would first like to thank my thesis advisor \textbf{Prof. Marcus Liwicki} who was always present physically or virtually whenever I ran into some trouble or had a crazy thought. He consistently steered me in the right the direction whenever he thought I needed it.
+I would first like to thank my thesis advisor \textbf{Prof. Marcus Liwicki} who was always present physically or virtually whenever I ran into some trouble or had a crazy thought. He consistently steered me in the right direction whenever he thought I needed it.
 
-I would also like to thank my second thesis advisor \textbf{Felix Laumann} with whom I worked in Copenhagen and in Kaiserslautern. He also made sure Imperial College, London is never far away with our continuous calls and discussion. He assisted in all different work done in the thesis from research to implementation and writing.\\
+I would also like to thank my second thesis advisor \textbf{Felix Laumann} with whom I worked in Copenhagen and in Kaiserslautern. He also made sure Imperial College, London is never far away with our continuous calls and Skype discussion. He contributed in all possible ways in the thesis and made sure we are on the right path. \\
 
 I would also like to acknowledge \textbf{University of Kaiserlautern} for providing me with the opportunity for writing the thesis and \textbf{MindGarage} for providing me with the computation power.\\
 
 I must express my very profound gratitude to my parents (\textbf{Vivek} and \textbf{Anita}), to my sister (\textbf{Mineshi}) and to \textbf{Purvanshi Mehta} for providing me with unfailing support and continuous encouragement throughout the thesis. 
 
-Finally, I am thankful to \textbf{Ashutosh Mishra}, \textbf{Saurabh Varshneya}, \textbf{Abhash Sinha} and \textbf{Ayushman Dash} for their invaluable comments and endless discussion. I am also thankful to \textbf{Sadique Adnan Siddiqui} for his late night tea and delicious food. This accomplishment would not have been possible without them. \\
-
-I would also like to mention \textbf{BotSupply} for providing me with the funds to go and meet new people to discuss ideas. \\ \\
+Finally, I am thankful to \textbf{Ashutosh Mishra}, \textbf{Saurabh Varshneya}, and \textbf{Ayushman Dash} for their invaluable comments and endless discussion. I am also thankful to \textbf{Sadique Adnan Siddiqui} for his late night tea and delicious food. This accomplishment would not have been possible without them. \\
 
+I would also like to mention \textbf{BotSupply} for providing me with the funds to go and meet new people to discuss ideas and to get some inspiration. \\ \\
 
 
+
 \end{acknowledgements}
diff --git a/Chapter1/chapter1.tex b/Chapter1/chapter1.tex
@@ -18,7 +18,7 @@ \chapter{Introduction} %Title of the First Chapter
 Deep Neural Networks (DNNs), are connectionist systems that learn to perform tasks by learning on examples without having a prior knowledge about the tasks. 
 They easily scale to millions of data points and yet remain tractable to optimize with stochastic gradient descent.
 
-Convolutional Neural Networks (CNNs), a variant of DNNs, have already surpassed human accuracy in the realm of image classification (e.g. \cite{he2016deep,simonyan2014very,krizhevsky2012imagenet}). Due to the capacity of \acp{cnn} to fit on a wide diversity of non-linear data points, they require a large amount of training data. This often makes \acp{cnn}s and Neural Networks in general, prone to overfitting on small datasets. The model tends to fit well to the training data, but are not predictive for new data. This often makes the Neural Networks incapable of correctly assessing the uncertainty in the training data and hence leads to overly confident decisions about the correct class, prediction or action.
+\acp{cnn}, a variant of DNNs, have already surpassed human accuracy in the realm of image classification (e.g. \cite{he2016deep,simonyan2014very,krizhevsky2012imagenet}). Due to the capacity of \acp{cnn} to fit on a wide diversity of non-linear data points, they require a large amount of training data. This often makes \acp{cnn} and Neural Networks in general, prone to overfitting on small datasets. The model tends to fit well to the training data, but are not predictive for new data. This often makes the Neural Networks incapable of correctly assessing the uncertainty in the training data and hence leads to overly confident decisions about the correct class, prediction or action.
 
 Various regularization techniques for controlling over-fitting are used in practice namely early stopping, weight decay, L1, L2 regularizations and currently the most popular and empirically effective technique being \emph{dropout}~\cite{hinton2012improving}. 
 
@@ -27,24 +27,23 @@ \section{Problem Statement}
 
 Despite Neural Networks architectures achieving state-of-the-art results in almost all classification tasks, Neural Networks still make over-confident decisions. A measure of uncertainty in the prediction is missing from the current Neural Networks architectures. Very careful training, weight control measures like regularization of weights and similar techniques are needed to make the models susceptible to over-fitting issues. 
 
-We address both of these concerns by introducing Bayesian learning to a Convolutional Neural Networks that adds a measure of uncertainty and regularization in their predictions. 
+We address both of these concerns by introducing Bayesian learning to Convolutional Neural Networks that adds a measure of uncertainty and regularization in their predictions. 
 
 \section{Current Situation}
 
-Deep Neural Networks have been successfully applied to many domains, including sensitive domains like health-care, security, fraudulent transactions and many more. However, from a probability theory perspective, it is unjustifiable to use single point-estimates as weights to base any classification on.
-On the other hand, Bayesian neural networks (NNs) are more robust to over-fitting, and can easily learn from small datasets. The Bayesian approach further offers uncertainty estimates via its parameters in form of probability distributions (see Figure \ref{fig:Scalar_Bayesian_Distribution}). At the same time, by using a prior probability distribution to integrate out the parameters, the average is computed across many models during training, which gives a regularization effect to the network, thus preventing overfitting.
-
+Deep Neural Networks have been successfully applied to many domains, including very sensitive domains like health-care, security, fraudulent transactions and many more. However, from a probability theory perspective, it is unjustifiable to use single point-estimates as weights to base any classification on.
+On the other hand, Bayesian neural networks are more robust to over-fitting, and can easily learn from small datasets. The Bayesian approach further offers uncertainty estimates via its parameters in form of probability distributions (see Figure 1.1). At the same time, by using a prior probability distribution to integrate out the parameters, the average is computed across many models during training, which gives a regularization effect to the network, thus preventing overfitting.
 
 Bayesian posterior inference over the neural network parameters is a theoretically attractive method for controlling overfitting; however, modelling a distribution over the kernels (also known as filters) of a \acp{cnn} has never been attempted successfully before, perhaps because of the vast number of parameters and extremely large models commonly used in practical applications.
 
-Even with a small number of parameters, inferring model posterior in a Bayesian NN is a difficult task. Approximations to the model posterior are often used instead, with the variational inference being a popular approach. In this approach one would model the posterior using a simple \textit{variational} distribution such as a Gaussian, and try to fit the distribution's parameters to be as close as possible to the true posterior. This is done by minimising the Kullback-Leibler divergence from the true posterior. Many have followed this approach in the past for standard NN models \citep{hinton1993keeping,barber1998ensemble,graves2011practical,blundell2015weight}.
-But the variational approach used to approximate the posterior in Bayesian NNs can be fairly computationally expensive -- the use of Gaussian approximating distributions increases the number of model parameters considerably, without increasing model capacity by much. \citet{blundell2015weight} for example use Gaussian distributions for Bayesian NN posterior approximation and have doubled the number of model parameters, yet report the same predictive performance as traditional approaches using dropout. This makes the approach unsuitable for use with \acp{cnn}s as the increase in the number of parameters is too costly.
+Even with a small number of parameters, inferring model posterior in a Bayesian NN is a difficult task. Approximations to the model posterior are often used instead, with the variational inference being a popular approach. In this approach one would model the posterior using a simple \textit{variational} distribution such as a Gaussian, and try to fit the distribution's parameters to be as close as possible to the true posterior. This is done by minimising the \textit{Kullback-Leibler divergence} from the true posterior. Many have followed this approach in the past for standard NN models \citep{hinton1993keeping,barber1998ensemble,graves2011practical,blundell2015weight}.
+But the variational approach used to approximate the posterior in Bayesian NNs can be fairly computationally expensive -- the use of Gaussian approximating distributions increases the number of model parameters considerably, without increasing model capacity by much. \citet{blundell2015weight} for example used Gaussian distributions for Bayesian NN posterior approximation and have doubled the number of model parameters, yet report the same predictive performance as traditional approaches using dropout. This makes the approach unsuitable in practice to use with \acp{cnn} as the increase in the number of parameters is too costly.
 
 \begin{figure}[H]
 \begin{center}
 \includegraphics[height=.28\textheight]{Chapter1/Figs/weights.png}
 \includegraphics[height=.28\textheight]{Chapter1/Figs/distribution.png}
-\label{fig:Scalar_Bayesian_Distribution}
+\label{fig:Scalar_Bayesian_Distribution_Gluon}
 \caption{Top: Each filter weight has a fixed value, as in the case of frequentist Convolutional Networks. Bottom: Each filter weight has a distribution, as in case of Bayesian Convolutional Networks. \cite{Gluon}}
 \end{center}
 \end{figure}
@@ -62,7 +61,7 @@ \section{Our Contribution}
  \item We empirically show that our proposed generic and reliable variational inference method for Bayesian \acp{cnn} can be applied to various \ac{cnn} architectures without any limitations on their performances. 
  \item We examine how to estimate the aleatoric and epistemic uncertainties and empirically show how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases. 
  \item We also empirically show how our method typically only doubles the number of parameters yet trains an infinite ensemble using unbiased Monte Carlo estimates of the gradients. 
- \item We also apply L1 norm to reduce the trained parameters and perform model pruning to reduce the number of model parameters without a reduction in the model prediction accuracy. 
+ \item Finally, we apply L1 norm to the trained model parameters and prune the number of non zero values and further, fine-tune the model to reduce the number of model parameters without a reduction in the model prediction accuracy. 
 \end{enumerate} 
 This work builds on the foundations laid out by Blundell et al. \cite{blundell2015weight}, who introduced \textit{Bayes by Backprop} for feedforward neural networks. Together with the extension to recurrent neural networks, introduced by Fortunato et al. \cite{fortunato2017bayesian}, \textit{Bayes by Backprop} is now applicable on the three most frequently used types of neural networks, i.e., feedforward, recurrent, and convolutional neural networks.
 

diff --git a/Chapter2/chapter2.tex b/Chapter2/chapter2.tex
@@ -49,10 +49,10 @@ \chapter{Background}
  Chapter Overview
  \begin{itemize}
  \item Neural Networks and Convolutional Neural Networks.
- \item Concepts like Variational Inference, and local reparameterization trick in Bayesian Neural Network.
+ \item Concepts overview of Variational Inference, and local reparameterization trick in Bayesian Neural Network.
  \item Backpropagation in Bayesian Networks using Bayes by Backprop.
  \item Estimation of Uncertainties in a network.
- \item Pruning a network to reduce the parameters without affecting the performance.
+ \item Pruning a network to reduce the number of overall parameters without affecting it's performance.
  \end{itemize}
  }
 }
@@ -62,7 +62,7 @@ \chapter{Background}
 \section{Neural Networks}
 \subsection{Brain Analogies}
 
-A perceptron is conceived as a mathematical model of how the neurons function in our brain by a famous psychologist Rosenblatt. According to him, a neuron takes a set of binary inputs (nearby neurons), multiplies each input by a continuous-valued weight (the synapse strength to each nearby neuron), and thresholds the sum of these weighted inputs to output a 1 if the sum is big enough and otherwise a 0 (in the same way neurons either fire or do not).
+A perceptron is conceived as a mathematical model of how the neurons function in our brain by a famous psychologist Rosenblatt. According to Rosenblatt, a neuron takes a set of binary inputs (nearby neurons), multiplies each input by a continuous-valued weight (the synapse strength to each nearby neuron), and thresholds the sum of these weighted inputs to output a 1 if the sum is big enough and otherwise a 0 (the same way neurons either fire or does not fire).
 
 \begin{figure}[H]
 \begin{center}