Skip to content

Commit

Permalink
Conclusion updated. And thesis over. Final commit. Hopefully!
Browse files Browse the repository at this point in the history
  • Loading branch information
kumar-shridhar committed Dec 23, 2018
1 parent b24b57a commit 6233b5a
Show file tree
Hide file tree
Showing 6 changed files with 41 additions and 18 deletions.
2 changes: 1 addition & 1 deletion Chapter2/chapter2.tex
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ \subsection{Local reparametrisation trick}
\section{Uncertainties in Bayesian Learning}


Uncertainties in a network is a measure of how certain the model is with its prediction. In Bayesian modeling, there are two main types of uncertainty one can model \citep{der2009aleatory}: \textit{Aleatoric} uncertainty and \textit{Epistemic} uncertainty.
Uncertainties in a network is a measure of how certain the model is with its prediction. In Bayesian modeling, there are two main types of uncertainty one can model \citep{Kiureghian}: \textit{Aleatoric} uncertainty and \textit{Epistemic} uncertainty.

\textit{Aleatoric} uncertainty measures the noise inherent in the observations. This type of uncertainty is present in the data collection method like the sensor noise or motion noise which is uniform along the dataset. This cannot be reduced if more data is collected. \textit{Epistemic} uncertainty, on the other hand, represents the uncertainty caused by the model. This uncertainty can be reduced given more data and is often referred to as \textit{model uncertainty}. Aleatoric uncertainty can further be categorized into \textit{homoscedastic} uncertainty, uncertainty which stays constant for different inputs, and \textit{heteroscedastic} uncertainty. Heteroscedastic uncertainty depends on the inputs to the model, with some inputs potentially having more noisy outputs than others. Heteroscedastic uncertainty is in particular important so that model prevents from outputting very confident decisions.

Expand Down
Binary file added Chapter4/Figs/bbb.pdf
Binary file not shown.
Binary file added Chapter4/Figs/det.pdf
Binary file not shown.
24 changes: 21 additions & 3 deletions Chapter4/chapter4.tex
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,14 @@ \section{Bayesian convolutional neural networks with variational inference}
\centering
\includegraphics[width=\linewidth]{Chapter4/Figs/CNNwithdist.png}
\end{minipage}
\caption{Input image with exemplary pixel values, filters, and corresponding output with point-estimates (top) and probability distributions (bottom) over weights.}
\caption{Input image with exemplary pixel values, filters, and corresponding output with point-estimates (top) and probability distributions (bottom) over weights.\cite{shridhar2018bayesian}}
\label{fig:filter_scalar}
\end{figure}
%
\begin{figure}[b!]
\begin{center}
\includegraphics[width=\linewidth]{Chapter4/Figs/CNNwithdist_grey.png}
\caption{Fully Bayesian perspective of an exemplary CNN. Weights in filters of convolutional layers, and weights in fully-connected layers have the form of a probability distribution.}
\caption{Fully Bayesian perspective of an exemplary CNN. Weights in filters of convolutional layers, and weights in fully-connected layers have the form of a probability distribution. \cite{shridhar2018bayesian}}
\label{fig:CNNwithdist_grey}
\end{center}
\end{figure}
Expand Down Expand Up @@ -94,7 +94,25 @@ \section{Uncertainty estimation in CNN}
\end{aligned}
\end{equation}
where $\Bar{p} = \frac{1}{T}\sum_{t=1}^T \hat{p}_t$ and $\hat{p}_t = \text{Softmax}\big ( f_{w_{t}}(x^*) \big )$.
\newline It is of paramount importance that uncertainty is split into aleatoric and epistemic quantities since it allows the modeler to evaluate the room for improvements: while aleatoric uncertainty (also known as statistical uncertainty) is merely a measure for the variation of ("noisy") data, epistemic uncertainty is caused by the model. Hence, a modeler can see whether the quality of the data is low (i.e. high aleatoric uncertainty), or the model itself is the cause for poor performances (i.e. high epistemic uncertainty). The former cannot be improved by gathering more data, whereas the latter can be done so. \cite{der2009aleatory} \cite{kendall2017uncertainties}.

\begin{figure}[H]
\centering
\begin{minipage}{.4\textwidth}
\centering
\includegraphics[width=\linewidth]{Chapter4/Figs/det.pdf}
\end{minipage}
%
\begin{minipage}{.4\textwidth}
\centering
\includegraphics[width=\linewidth]{Chapter4/Figs/bbb.pdf}
\end{minipage}
\caption{Predictive distributions is estimated for a low-dimensional active learning task. The predictive distributions are visualized as mean and two standard deviations shaded. \textcolor{blue}{$\blacksquare$} shows the epistemic uncertainty and \textcolor{red}{$\blacksquare$} shows the aleatoric noise. Data points are shown in \textcolor{grey}{$\blacksquare$}.
\textbf{(Left)} A deterministic network conflates uncertainty as part of the noise and is overconfident outside of the data distribution.
\textbf{(Right)} A variational Bayesian neural network with standard normal prior represents uncertainty and noise separately but is overconfident outside of the training distribution as defined by \cite{hafner2018reliable}}
\label{fig:filter_scalar}
\end{figure}

\newline It is of paramount importance that uncertainty is split into aleatoric and epistemic quantities since it allows the modeler to evaluate the room for improvements: while aleatoric uncertainty (also known as statistical uncertainty) is merely a measure for the variation of ("noisy") data, epistemic uncertainty is caused by the model. Hence, a modeler can see whether the quality of the data is low (i.e. high aleatoric uncertainty), or the model itself is the cause for poor performances (i.e. high epistemic uncertainty). The former cannot be improved by gathering more data, whereas the latter can be done so. \cite{Kiureghian} \cite{kendall2017uncertainties}.

\section{Model pruning}

Expand Down
11 changes: 7 additions & 4 deletions Chapter7/chapter7.tex
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
\chapter{Conclusion and Outlook}

We propose Bayesian \acp{cnn} utilizing \textit{Bayes by Backprop} as a reliable, variational inference method for \acp{cnn} which has not been studied to-date, and estimate the models' aleatoric and epistemic uncertainties.
\newline There has been previous work by Gal and Ghahramani \cite{gal2015bayesian} who utilized the various outputs of a Dropout function to define a distribution, and concluded that one can then speak of a Bayesian \ac{cnn}. This approach finds, perhaps also due to its ease, a large confirming audience. However, we argue against this approach and claim deficiencies. Specifically, in Gal's and Ghahramani's \cite{gal2015bayesian} approach, no prior probability distributions $p(w)$ are placed on the \ac{cnn}'s parameters. But, these are a substantial part of a Bayesian interpretation for the simple reason that Bayes' theorem includes them. Thus we argue, starting with prior probability distributions $p(w)$ is essential in Bayesian methods. In comparison, we place prior probability distributions over all model parameters and update them according to Bayes' theorem with variational inference, precisely \textit{Bayes by Backprop}. We show that these neural networks achieve state-of-the-art results as those achieved by the same network architectures trained by frequentist inference. Furthermore, we examine how aleatoric and epistemic uncertainties can be computed for our proposed method and show the natural regularization effect of Bayesian methods.
\newline As an add-on method to further enhance the stability of the optimization, \textit{posterior sharpening} \cite{fortunato2017bayesian} could be applied to Bayesian \acp{cnn} in future work. There, the variational posterior distribution $q_{\theta}(w|\mathcal{D})$ is conditioned on the training data of a batch $\mathcal{D}^{(i)}$. We can see $q_{\theta}(w|\mathcal{D}^{(i)})$ as a proposal distribution, or \textit{hyper-prior} when we rethink it as a hierarchical model, to improve the gradient estimates of the intractable likelihood function $p(\mathcal{D}|w)$. Finally, the model is pruned with simple methods like L1 norm and more compression tricks like vector quantization \cite{DBLP:journals/corr/GongLYB14} and group sparsity regularization \cite{DBLP:conf/nips/AlvarezS16} can be applied.
%
We propose Bayesian \acp{cnn} utilizing \textit{Bayes by Backprop} as a reliable, variational inference method for \acp{cnn} which has not been studied to-date, and estimate the models' aleatoric and epistemic uncertainties for prediction. Furthermore, we apply different ways to pruning the Bayesian \ac{cnn} and compare its results with frequentist architectures.
\newline There has been previous work by Gal and Ghahramani \cite{gal2015bayesian} who utilized the various outputs of a Dropout function to define a distribution, and concluded that one can then speak of a Bayesian \ac{cnn}. This approach finds, perhaps also due to its ease, a large confirming audience. However, we argue against this approach and claim deficiencies. Specifically, in Gal's and Ghahramani's \cite{gal2015bayesian} approach, no prior probability distributions $p(w)$ are placed on the \ac{cnn}'s parameters. But, these are a substantial part of a Bayesian interpretation for the simple reason that Bayes' theorem includes them. Thus we argue, starting with prior probability distributions $p(w)$ is essential in Bayesian methods. In comparison, we place prior probability distributions over all model parameters and update them according to Bayes' theorem with variational inference, precisely \textit{Bayes by Backprop}. We show that these neural networks achieve state-of-the-art results as those achieved by the same network architectures trained by frequentist inference.
\newline Furthermore, we examine how uncertainties (both aleatoric and epistemic uncertainties) can be computed for our proposed method and we show that how epistemic uncertainties can be reduced upon more training data. We also compare the effect of dropout in a frequentist network to the proposed Bayesian \ac{cnn} and show the natural regularization effect of Bayesian methods. To counter the twice number of parameters (mean and variance) in a Bayesian \ac{cnn} compared to a single point estimate weight in a frequentist method, we apply methods of network pruning and show that the Bayesian \ac{cnn} performs equally good or better even when the network is pruned and the number of parameters is made comparable to a frequentist method.
\newline Finally, we show the applications of Bayesian \acp{cnn} in various domains like Image recognition, Image Super-Resolution and Generative Adversarial Networks (GANs). The results are compared with other popular approaches in the field and a comparison of results are drawn. Bayesian \acp{cnn} in general, proved to be a good idea to be applied on GANs as prior knowledge for discriminator network helps in better identification of real vs fake images. \\


As an add-on method to further enhance the stability of the optimization, \textit{posterior sharpening} \cite{fortunato2017bayesian} could be applied to Bayesian \acp{cnn} in future work. There, the variational posterior distribution $q_{\theta}(w|\mathcal{D})$ is conditioned on the training data of a batch $\mathcal{D}^{(i)}$. We can see $q_{\theta}(w|\mathcal{D}^{(i)})$ as a proposal distribution, or \textit{hyper-prior} when we rethink it as a hierarchical model, to improve the gradient estimates of the intractable likelihood function $p(\mathcal{D}|w)$. Finally, the model is pruned with simple methods like L1 norm and more compression tricks like vector quantization \cite{DBLP:journals/corr/GongLYB14} and group sparsity regularization \cite{DBLP:conf/nips/AlvarezS16} can be applied. Also, the concept of Bayesian \ac{cnn} can be applied to the generative network of a GAN to generate fake images that capture a better representation of a real image.
22 changes: 12 additions & 10 deletions References/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,6 @@ @inproceedings{he2016deep
pages={770--778},
year={2016}
}
@article{der2009aleatory,
title={Aleatory or epistemic? Does it matter?},
author={Der Kiureghian, Armen and Ditlevsen, Ove},
journal={Structural Safety},
volume={31},
number={2},
pages={105--112},
year={2009},
publisher={Elsevier}
}
@inproceedings{kendall2017uncertainties,
title={What uncertainties do we need in bayesian deep learning for computer vision?},
author={Kendall, Alex and Gal, Yarin},
Expand Down Expand Up @@ -601,4 +591,16 @@ @article{DBLP:journals/corr/RadfordMC15
volume = {abs/1511.06434},
year = {2015}
}
@article{shridhar2018bayesian,
title={Bayesian Convolutional Neural Networks with Variational Inference},
author={Shridhar, Kumar and Laumann, Felix and Llopart Maurin, Adrian and Olsen, Martin and Liwicki, Marcus},
journal={arXiv preprint arXiv:1806.05978},
year={2018}
}
@article{hafner2018reliable,
title={Reliable uncertainty estimates in deep neural networks using noise contrastive priors},
author={Hafner, Danijar and Tran, Dustin and Irpan, Alex and Lillicrap, Timothy and Davidson, James},
journal={arXiv preprint arXiv:1807.09289},
year={2018}
}

0 comments on commit 6233b5a

Please sign in to comment.