Skip to content

Commit

Permalink
All over. Hopefull final commit
Browse files Browse the repository at this point in the history
  • Loading branch information
kumar-shridhar committed Dec 4, 2018
1 parent d4cb739 commit c268125
Show file tree
Hide file tree
Showing 8 changed files with 80 additions and 84 deletions.
6 changes: 3 additions & 3 deletions Abstract/abstract.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@
Generally, the network using point estimates as weights perform well with large datasets, but they fail to express uncertainty in regions with little or no data, leading to overconfident decisions.
\newline

In this thesis, Bayesian Convolutional Neural Network (BayesCNN) using Variational Inference is proposed, that introduces probability distribution over the weights. Furthemore, the proposed BayesCNN architecture is applied to tasks like Image Classification, Image Super-Resolution and Generative Adversarial Networks.
In this thesis, Bayesian Convolutional Neural Network (BayesCNN) using Variational Inference is proposed, that introduces probability distribution over the weights. Furthermore, the proposed BayesCNN architecture is applied to tasks like Image Classification, Image Super-Resolution and Generative Adversarial Networks.

BayesCNN is based on Bayes by Backprop which derives a variational approximation to the true posterior.
Our proposed method not only achieves performances equivalent to frequentist inference in identical architectures but also incorporate a measurement for uncertainties and regularisation. It further eliminates the use of dropout in the model. Furthermore, we predict how certain the model prediction is based on the epistemic and aleatoric uncertainites and finally propose ways to prune the model and make it computational and time effective.
Our proposed method not only achieves performances equivalent to frequentist inference in identical architectures but also incorporate a measurement for uncertainties and regularisation. It further eliminates the use of dropout in the model. Furthermore, we predict how certain the model prediction is based on the epistemic and aleatoric uncertainties and finally propose ways to prune the model and make it computational and time effective.
\newline


In the first part of the thesis, the Bayesian Neural Network is explained and it is applied to an Image Classification task. The results are compared to point-estimates based architectures on MNIST, CIFAR-10, CIFAR-100 and STL-10 datasets. Moreoover, uncertainites are calculated and pruning of the architecture is done.
In the first part of the thesis, the Bayesian Neural Network is explained and it is applied to an Image Classification task. The results are compared to point-estimates based architectures on MNIST, CIFAR-10, CIFAR-100 and STL-10 datasets. Moreover, uncertainties are calculated and pruning of the architecture is done.

In the second part of the thesis, the concept is further applied to other computer vision tasks namely, Image Super-Resolution and Generative Adversarial Networks. The concept of BayesCNN is tested and compared against other concepts in a similar domain.

Expand Down
14 changes: 7 additions & 7 deletions Chapter1/chapter1.tex
Original file line number Diff line number Diff line change
Expand Up @@ -18,26 +18,26 @@ \chapter{Introduction} %Title of the First Chapter
Deep Neural Networks (DNNs), are connectionist systems that learn to perform tasks by learning on examples without having a prior knowledge about the tasks.
They easily scale to millions of data points and yet remain tractable to optimize with stochastic gradient descent.

Convolutional Neural Networks (CNNs), a variant of DNNs, have already surpassed human accuracy in the realm of image classification (e.g. \cite{he2016deep,simonyan2014very,krizhevsky2012imagenet}). Due to the capacity of \acp{cnn} to fit on a wide diversity of non-linear data points, they require large amount of training data. This often makes \acp{cnn}s and Neural Networks in general, prone to overfitting on small datasets. The model tends to fit well to the training data, but are not predictive for new data. This often makes the Neural Networks incapable of correctly assessing the uncertainty in the training data and hence leads to overly confident decisions about the correct class, prediction or action.
Convolutional Neural Networks (CNNs), a variant of DNNs, have already surpassed human accuracy in the realm of image classification (e.g. \cite{he2016deep,simonyan2014very,krizhevsky2012imagenet}). Due to the capacity of \acp{cnn} to fit on a wide diversity of non-linear data points, they require a large amount of training data. This often makes \acp{cnn}s and Neural Networks in general, prone to overfitting on small datasets. The model tends to fit well to the training data, but are not predictive for new data. This often makes the Neural Networks incapable of correctly assessing the uncertainty in the training data and hence leads to overly confident decisions about the correct class, prediction or action.

Various regularization techniques for controlling over-fitting are used in practice namely early stopping, weight decay, L1, L2 regularizations and currently the most popular and empirically effective technique being \emph{dropout}~\cite{hinton2012improving}.
Various regularization techniques for controlling over-fitting are used in practice namely early stopping, weight decay, L1, L2 regularizations and currently the most popular and empirically effective technique being \emph{dropout}~\cite{hinton2012improving}.


\section{Problem Statement}

Despite Neural Networks architectures achieving state-of-the-art results in almost all classification tasks, Neural Networks still make over-confident decisions. A measure of uncertainty in the prediction is missing from the current Neural Networks architectures. Very careful training, weight control measures like regularization of weights and similar techniques are needed to make the models susceptible to over-fitting issues.

We address both of these concerns by introducing Bayesian learning to a Convolutional Neural Networks that adds a measure for uncertainty and regularization in their predictions.
We address both of these concerns by introducing Bayesian learning to a Convolutional Neural Networks that adds a measure of uncertainty and regularization in their predictions.

\section{Current Situation}

Deep Neural Networks have been successfully applied to many domains, including sensitive domains like health-care, security, fraudulent transactions and many more. However, from a probability theory perspective, it is unjustifiable to use single point-estimates as weights to base any classification on.
On the other hand, Bayesian neural networks (NNs) are more robust to over-fittings, and can easily learn from small datasets. Bayesian approach further offers uncertainty estimates via its parameters in form of probability distributions (see Figure \ref{fig:Scalar_Bayesian_Distribution}). At the same time, by using a prior probability distribution to integrate out the parameters, the average is computed across many models during training, which gives a regularization effect to the network, thus preventing overfitting.
On the other hand, Bayesian neural networks (NNs) are more robust to over-fitting, and can easily learn from small datasets. The Bayesian approach further offers uncertainty estimates via its parameters in form of probability distributions (see Figure \ref{fig:Scalar_Bayesian_Distribution}). At the same time, by using a prior probability distribution to integrate out the parameters, the average is computed across many models during training, which gives a regularization effect to the network, thus preventing overfitting.


Bayesian posterior inference over the neural network parameters is a theoretically attractive method for controlling overfitting; however, modelling a distribution over the kernels (also known as filters) of a \acp{cnn} has never been attempted successfully before, perhaps because of the vast number of parameters and extremely large models commonly used in practical applications.

Even with a small number of parameters, inferring model posterior in a Bayesian NN is a difficult task. Approximations to the model posterior are often used instead, with variational inference being a popular approach. In this approach one would model the posterior using a simple \textit{variational} distribution such as a Gaussian, and try to fit the distribution's parameters to be as close as possible to the true posterior. This is done by minimising the Kullback-Leibler divergence from the true posterior. Many have followed this approach in the past for standard NN models \citep{hinton1993keeping,barber1998ensemble,graves2011practical,blundell2015weight}.
Even with a small number of parameters, inferring model posterior in a Bayesian NN is a difficult task. Approximations to the model posterior are often used instead, with the variational inference being a popular approach. In this approach one would model the posterior using a simple \textit{variational} distribution such as a Gaussian, and try to fit the distribution's parameters to be as close as possible to the true posterior. This is done by minimising the Kullback-Leibler divergence from the true posterior. Many have followed this approach in the past for standard NN models \citep{hinton1993keeping,barber1998ensemble,graves2011practical,blundell2015weight}.
But the variational approach used to approximate the posterior in Bayesian NNs can be fairly computationally expensive -- the use of Gaussian approximating distributions increases the number of model parameters considerably, without increasing model capacity by much. \citet{blundell2015weight} for example use Gaussian distributions for Bayesian NN posterior approximation and have doubled the number of model parameters, yet report the same predictive performance as traditional approaches using dropout. This makes the approach unsuitable for use with \acp{cnn}s as the increase in the number of parameters is too costly.

\begin{figure}[H]
Expand All @@ -57,8 +57,8 @@ \section{Our Hypothesis}
\section{Our Contribution}
\newline The main contributions of our work are as follows:
\begin{enumerate}
\item We present how \textit{Bayes by Backprop} can be efficiently applied to \acp{cnn}. We therefore introduce the idea of applying two convolutional operations, one for the mean and one for the variance.
\item We show how the model learn richer representations and predictions from cheap model averaging.
\item We present how \textit{Bayes by Backprop} can be efficiently applied to \acp{cnn}. We, therefore, introduce the idea of applying two convolutional operations, one for the mean and one for the variance.
\item We show how the model learns richer representations and predictions from cheap model averaging.
\item We empirically show that our proposed generic and reliable variational inference method for Bayesian \acp{cnn} can be applied to various \ac{cnn} architectures without any limitations on their performances.
\item We examine how to estimate the aleatoric and epistemic uncertainties and empirically show how the uncertainty can decrease, allowing the decisions made by the network to become more deterministic as the training accuracy increases.
\item We also empirically show how our method typically only doubles the number of parameters yet trains an infinite ensemble using unbiased Monte Carlo estimates of the gradients.
Expand Down
Loading

0 comments on commit c268125

Please sign in to comment.