Chapter7/chapter7.tex

\chapter{Conclusion and Outlook}

We propose Bayesian \acp{cnn} utilizing \textit{Bayes by Backprop} as a reliable, variational inference method for \acp{cnn} which has not been studied to-date, and estimate the models' aleatoric and epistemic uncertainties for prediction. Furthermore, we apply different ways to pruning the Bayesian \ac{cnn} and compare its results with frequentist architectures.
\newline There has been previous work by Gal and Ghahramani \cite{gal2015bayesian} who utilized the various outputs of a Dropout function to define a distribution, and concluded that one can then speak of a Bayesian \ac{cnn}. This approach finds, perhaps also due to its ease, a large confirming audience. However, we argue against this approach and claim deficiencies. Specifically, in Gal's and Ghahramani's \cite{gal2015bayesian} approach, no prior probability distributions $p(w)$ are placed on the \ac{cnn}'s parameters. But, these are a substantial part of a Bayesian interpretation for the simple reason that Bayes' theorem includes them. Thus we argue, starting with prior probability distributions $p(w)$ is essential in Bayesian methods. In comparison, we place prior probability distributions over all model parameters and update them according to Bayes' theorem with variational inference, precisely \textit{Bayes by Backprop}. We show that these neural networks achieve state-of-the-art results as those achieved by the same network architectures trained by frequentist inference. 
\newline Furthermore, we examine how uncertainties (both aleatoric and epistemic uncertainties) can be computed for our proposed method and we show that how epistemic uncertainties can be reduced upon more training data. We also compare the effect of dropout in a frequentist network to the proposed Bayesian \ac{cnn} and show the natural regularization effect of Bayesian methods. To counter the twice number of parameters (mean and variance) in a Bayesian \ac{cnn} compared to a single point estimate weight in a frequentist method, we apply methods of network pruning and show that the Bayesian \ac{cnn} performs equally good or better even when the network is pruned and the number of parameters is made comparable to a frequentist method. 
\newline Finally, we show the applications of Bayesian \acp{cnn} in various domains like Image recognition, Image Super-Resolution and Generative Adversarial Networks (GANs). The results are compared with other popular approaches in the field and a comparison of results are drawn. Bayesian \acp{cnn} in general, proved to be a good idea to be applied on GANs as prior knowledge for discriminator network helps in better identification of real vs fake images. \\


As an add-on method to further enhance the stability of the optimization, \textit{posterior sharpening} \cite{fortunato2017bayesian} could be applied to Bayesian \acp{cnn} in future work. There, the variational posterior distribution $q_{\theta}(w|\mathcal{D})$ is conditioned on the training data of a batch $\mathcal{D}^{(i)}$. We can see $q_{\theta}(w|\mathcal{D}^{(i)})$ as a proposal distribution, or \textit{hyper-prior} when we rethink it as a hierarchical model, to improve the gradient estimates of the intractable likelihood function $p(\mathcal{D}|w)$. Finally, the model is pruned with simple methods like L1 norm and more compression tricks like vector quantization \cite{DBLP:journals/corr/GongLYB14} and group sparsity regularization  \cite{DBLP:conf/nips/AlvarezS16} can be applied. Also, the concept of Bayesian \ac{cnn} can be applied to the generative network of a GAN to generate fake images that capture a better representation of a real image.