-
Notifications
You must be signed in to change notification settings - Fork 2
/
gl-nofmt-nov.tex
459 lines (413 loc) · 29.7 KB
/
gl-nofmt-nov.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
\documentclass[11pt]{article}
\usepackage[slantedGreek]{mathpazo}
\usepackage{amsmath,amssymb,amsthm,amsfonts}
\usepackage[top=8pc,bottom=8pc,left=8pc,right=8pc]{geometry}
%\usepackage{ntheorem}
\usepackage{subcaption}
\usepackage{multirow}
%% Please use the following statements for
%% managing the text and math fonts for your papers:
\usepackage{times}
%\usepackage[cmbold]{mathtime}
\usepackage{bm}
% this order is important
\RequirePackage[hyphens]{url}
\RequirePackage[colorlinks,citecolor=blue,urlcolor=blue]{hyperref}
\usepackage[authoryear]{natbib}
\usepackage[plain,noend]{algorithm2e}
%% For compressing some space:
%\setlength{\textfloatsep}{10pt plus 1.0pt minus 2.0pt}
%\setlength{\floatsep}{12.0pt plus 2.0pt minus 5.0pt}
%\setlength{\intextsep}{12.0pt plus 2.0pt minus 5.0pt}
%\setlength{\belowcaptionskip}{-2pt}
%\setlength{\textheight}{9in}
%\setlength{\textwidth}{6in}
%\setlength{\topmargin}{-36pt}
%\setlength{\oddsidemargin}{0pt}
%\setlength{\evensidemargin}{0pt}
\newtheorem{theorem}{Theorem}
\newtheorem{acknowledgement}[theorem]{Acknowledgement}
%\newtheorem{algorithm}[theorem]{Algorithm}
\newtheorem{axiom}[theorem]{Axiom}
\newtheorem{case}[theorem]{Case}
\newtheorem{claim}[theorem]{Claim}
\newtheorem{conclusion}[theorem]{Conclusion}
\newtheorem{condition}[theorem]{Condition}
\newtheorem{conjecture}[theorem]{Conjecture}
\newtheorem{corollary}[theorem]{Corollary}
\newtheorem{criterion}[theorem]{Criterion}
\newtheorem{definition}[theorem]{Definition}
\newtheorem{example}[theorem]{Example}
\newtheorem{exercise}[theorem]{Exercise}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{notation}[theorem]{Notation}
\newtheorem{problem}[theorem]{Problem}
\newtheorem{proposition}[theorem]{Proposition}
\newtheorem{remark}[theorem]{Remark}
\newtheorem{solution}[theorem]{Solution}
\newtheorem{summary}[theorem]{Summary}
\usepackage{array}
\include{math-commands}
\begin{document}
%% The left and right page headers are defined here:
\markboth{Bhadra, Datta, Polson, and Willard}{Global-Local Mixtures}
%% Here are the title, author names and addresses
\title{\vspace{-1cm} A Short Note on Global-Local Mixtures}
\author{Anindya Bhadra \footnote{{\em Address:} 250 N. University St., West Lafayette, IN 47907, email: [email protected].} \\Purdue University
\and Jyotishka Datta \footnote{{\em Address:} Department of Mathematical Sciences, SCEN 309, 1 University of Arkansas, Fayetteville, AR, 72701, email: [email protected].}\\ University of Arkansas\\
\and Nicholas G. Polson \footnote{{\em Address:} 5807 S. Woodlawn Ave., Chicago, IL 60637, email: [email protected].} \ \ and Brandon Willard \footnote{{\em Address:} 5807 S. Woodlawn Ave., Chicago, IL 60637, email: [email protected].} \\The University of Chicago Booth School of
Business}
\maketitle
\begin{abstract}
Global-local mixtures are derived from the \CS{} and Liouville integral transformation identities. We characterize well-known normal-scale mixture distributions including the Laplace or lasso, logit and quantile as well as new global-local mixtures. We also apply our methodology to convolutions that commonly arise in Bayesian inference. Finally, we conclude with a conjecture concerning bridge and uniform correlation mixtures.
\end{abstract}
%\begin{keywords}
\noindent {\bf Keywords:}
Bayes regularization; Cauchy; Convolution; Global-local mixture; Lasso; Logistic; Quantile; Stable law. % 3 to 8 kwds, alphabetically. Deleted Scale mixture;
%\end{keywords}
\section{Introduction}
Many statistical problems involve regularization penalties derived from global-local mixture distributions \citep{polson_data_2011, hans2011comment, bhadra2015horseshoe+}. A global-local mixture density, denoted by $p(x_1, \ldots, x_p)$, takes the form
\[
p(x_1, \ldots, x_p) = \int_{0}^{\infty}\prod_{i=1}^{p} p(x_i \mid \tau) p(\tau) d\tau,
\]
where $p(x_i \mid \tau) = \int_{0}^{\infty} p(x_i \mid \lambda_i, \tau) p(\lambda_i \mid \tau) d\lambda_i$ is a local mixture and $p(x_1, \ldots, x_p)$ is a global mixture over $\tau \sim p(\tau)$. There is great interest in analytically calculating $p(x_i \mid \tau)$, and the associated regularization penalty $\phi(x_i, \tau) = -\log p(x_i \mid \tau)$. Convolution mixtures of the form $p(x_i \mid \tau) = \int p(x_i - \lambda_i) p(\lambda_i) d \lambda_i$ are also of interest. We show how the \CS{} and Liouville transformations can be used to derive closed-form global-local mixtures. We start by stating two key integral identities: the \CS{} transformation:
\begin{equation}
\int_0^\infty f \left\{ ( a x - b x^{-1} )^2 \right\} d x = \frac{1}{2a} \int_0^\infty f(y^2) d y, \quad a, b >0 \;, \label{eq:identity}
\end{equation}
and the Liouville transformation:
\begin{equation}
\int_{0}^{\infty} f\left(ax + \frac{b}{x} \right) x^{-1/2}dx = a^{-1/2} \int_{0}^{\infty} f\left\{ 2 (ab)^{1/2} + y \right\} y^{-1/2} dy, \quad a, b >0 \;.
\label{eq:liouville}
\end{equation}
See \citet{boros2006irresistible}, \citet{baker2008probabilistic} and \citet{jones_generating_2014} for further discussion. Identity \eqref{eq:identity} follows from the simple transformation $t = b/(a x)$ as
\begin{equation*}
I = \int_{0}^{\infty} f \left\{(ax - b/x)^2 \right\} dx = \int_{0}^{\infty} f \left\{(at - b/t)^2 \right\} \frac{b}{a t^2} dt, \quad a, b >0.
\end{equation*}
Adding the two terms in the last equality yields
\[
2 I = \int_{0}^{\infty} f \left\{(at - b/t)^2 \right\} \left\{ 1+{b}/({a t^2}) \right\} dt,
\]
\sloppy
and transforming $y = b/t - at$ gives $dy = -a \{1+{b}/({a t^2})\} dt$, yielding $I = (2a)^{-1} \int_{0}^{\infty} f(y^2) dy$, as required. A useful generalization of the \CS{} transformation is as follows:
\begin{equation}
\int_0^\infty f\left[ \{x-s(x)\}^2 \right] dx = \int_0^\infty f( y^2 ) dy, \label{eq:gen}
\end{equation}
where $s(x)=s^{-1}(x)$ is a self-inverse function such as $s(x) = b/x$ or $s(x) = -a^{-1}\log\{1-\exp(a x)\}$. The proof for the Liouville transformation identity follows in a similar manner, and is omitted for the sake of brevity. %These identities can be used to construct new global-local mixture distributions.
We can use these results to generate new probability distributions with different choices of simple baseline function $g(\cdot)$ and derive new scale mixture representations that are useful in Bayesian global-local modeling. The \CS{} and Liouville transformations can generate new distributions via scale transformations that can take the form $f(x) = 2 g\{ t(x) \}$ for certain $f(x)$ under suitable conditions. The simplest example is creating a new global-local scale family, $f(a x - b/x)$ by effectively reallocating the probability mass of a given density $f(x)$.
More generally, let $f(x) = 2g\{ t(x) \}$ and let $t(x)$ be of the form $x-s(x)$, where $s : \Re^+ \to \Re^+$ is a self-inverse, onto and monotone decreasing function. \citet{jones_generating_2014} shows that only a few choices of $t(x)$ leads to fully tractable formulae for its inverse $t^{-1}= \Pi$ and the integral
$\Pi(y) = \int_{-\infty}^{y} \pi(\omega) d\omega$. Two special choices are the $t$-distribution with 2 degrees of freedom and the logistic, as shown below:
\begin{align*}
\Pi_{T}(y) = (1/2)\{ y+(4b+y^2)^{1/2}\}, \quad \Pi_T^{-1}(x) = t_T(x) = x - b/x, \quad b >0,\\
\Pi_{L}(y) = a^{-1} \log(1+e^{ay}), \quad \Pi_L^{-1}(x) = t_L(x) = a^{-1} \log(e^{ax}-1), \quad a>0.
\end{align*}
Now, the integral identity in \eqref{eq:gen} shows that if $f(x)$, $x \geq 0$ is a density function, so is $g(x) = f\{\lvert x-s(x) \rvert\}$, $x \ge 0$. The functions $f(\cdot)$ and $g(\cdot)$ are called mother and daughter density functions, respectively. %\citet{chaubey2010reciprocal} provide a one-to-one correspondence between $f(\cdot)$ and $g(\cdot)$.
%%% maybe add one or two lines
The mother and daughter density functions, $f(\cdot)$ and $g(\cdot)$ are linked via a dual relationship with respect to symmetry and reciprocal symmetry for densities supported on the whole real line or its positive half $\Re^{+}$, respectively. The density function $f(\cdot)$ on $\Re^{+}$ is defined to have reciprocal symmetry (or, R-symmetry) if $f(\theta y) = f(\theta / y)$ for all $y > 0$ and some $\theta >0$. It turns out that if $f(x)$ is the pdf of a symmetric real-valued random variable $X$, the daughter pdf $g(x) = f(x-1/x), x>0$ is an R-symmetric density, and vice-versa, there exists a symmetric density $f(x) = g(x+\sqrt{1+x^2})$ for
every R-symmetric density $g(x)$. Furthermore, $f(\cdot)$ is unimodal if and only if $g(\cdot)$ is unimodal. \cite{chaubey2010reciprocal} provide a few examples of generating R-symmetric densities $g$ starting from well-known symmetric densities $f$. The most well-known example of this duality is perhaps the normal density as $f$ that gives rise to the root reciprocal inverse Gaussian, abbreviated as RRIG, distribution, with density given by:
\[
g(x) = \sqrt{\frac{2\lambda}{\pi}} \exp \left\{ - \frac{\lambda}{2} \left( x - \frac{1}{x} \right)^2 \right\}, x >0.
\]
Once again, the \CS{} transformation $y = x - x^{-1}$ guarantees that this is a valid probability density function.
A particularly useful tool for generating univariate and multivariate random variables is Khintchine's theorem, which states that any random variable $X$ with a unimodal, univariate distribution and a mode at zero can be written as a product $X = Z U$, where $U \sim \UnifRV(0,1)$ and $Z$ has the density function $f_Z(z) = -z f^{\prime}_{X}(z), z \in \Re$. \citet{bryson1982constructing}, and subsequently \citet{jones2012khintchine}, discuss how Khintchine's theorem allows us to construct both univariate and multivariate densities, even with special dependence structure. \citet{jones_generating_2014} develops an extended Khintchine's theorem that further allows us to generate random variables with unimodal densities of the form $2 g\{t(x)\}$.
The rest of the paper is organized as follows: \S\ref{sec:gls_mixes} derives scale mixture results for the Lasso, quantile and logistic regression, \S\ref{sec:convolutions} for convolutions of densities via mixtures and finally \S\ref{sec:discussion} concludes with two open problems.
\section{Global-local Scale Mixtures}
\label{sec:gls_mixes}
\subsection{Lasso as a normal scale mixture}
The Lasso penalty arises as a Laplace global-local mixture \citep{andrews_scale_1974}. A simple transformation proof follows using \CS{} with $f(x) = e^{-x}$. Starting with the normal integral identity, $\int_{0}^{\infty} f(y^2) dy = \int_0^\infty e^{-y^2} dy = \pi^{1/2}/2 $, we obtain:
\[
\int_0^\infty e^{-(a x)^2 - (b/x)^2} d x = \int_0^{\infty}\exp\left\{-a b \left(\frac{a}{b} x^2 + \frac{b}{a} x^{-2} \right)\right\} dx = \frac{\pi^{1/2}}{2a} e^{-2 a b}, \quad a,b \in \Re.
\]
Substituting $t = (a/b)^{1/2} x$ and $c = ab$ yields the Laplace or Lasso penalty as
\begin{align*}
\int_0^\infty e^{- c (t - t^{-1})^2} dt &= \half (\pi/c)^{1/2} \Rightarrow \int_0^\infty e^{- c (t^2 + t^{-2})} dt = \half (\pi/c)^{1/2} e^{-2c}\;.
%\label{eq:andrews}
\end{align*}
The Laplace density can be viewed as a transformed normal, via $y = t - t^{-1}$.
\begin{proposition}
The usual identity for the Lasso also follows from \citet{levy1940certains} as
\begin{equation}
\int_{0}^{\infty} \frac{a}{(2 \pi)^{1/2} t^{3/2}} e^{-{a^2}/({2 t})} e^{-\lambda t} dt = e^{-a (2 \lambda)^{1/2} } \;.\label{eq:levy}
\end{equation}
For $a = 1$, and $\theta = (2 \lambda)^{1/2}$, this can be written as
\begin{equation}
E \left[ \exp\{-\theta^2/(2G)\} \right] = \exp(-\theta),\quad \mbox{where} \; G \sim \GammaRV(1/2, 1/2).
\label{eq:gamma}
\end{equation}
\end{proposition}
\begin{proof}
First substitute $t^{-1} = x^2$, which makes the left hand side in \eqref{eq:levy} equal to
\[
\int_{0}^{\infty} \frac{a}{(2 \pi)^{1/2} t^{3/2}} e^{-{a^2}/({2 t})} e^{-\lambda t} dt = \left(\frac{2}{\pi}\right)^{1/2}ae^{-a (2 \lambda)^{1/2}}
\int_0^{\infty} e^{-({2}^{-1/2} ax - \lambda x^{-1})^2} dx = e^{-a (2 \lambda)^{1/2}}\;.
\]
The last step follows from \CS{} formula. The second relationship \eqref{eq:gamma} follows by fixing $a = 1$, $\theta = (2\lambda)^{1/2}$ and
substituting $t = x^{-1}$.
\[
\int_{0}^{\infty} \frac{a}{(2 \pi)^{1/2} t^{3/2}} e^{-{a^2}/({2 t})} e^{-\lambda t} dt = \frac{1}{(2 \pi)^{1/2}} \int_{0}^{\infty} e^{-{\theta^2}/({2x})}
x^{-1/2} e^{-x/2} dx.
\]
The left hand side can be identified as $E \left[ \exp\{-\theta^2/(2G)\} \right]$ for $G \sim \GammaRV(1/2, 1/2)$.
\end{proof}
%% hyperbolic-GIG \citep{barndorff1977infinite}
\subsection{Logit and quantile as global-local mixtures}
Logistic modeling can be viewed within the global-local mixture framework via the \PG{} distribution \citep{polson_bayesian_2013}. As \citet{polson_bayesian_2013} show, this mixture representation leads to efficient Markov chain Monte Carlo algorithms for inference.
\begin{proposition}
The two key marginal distributions for the hyperbolic generalized inverse Gaussian \citep{barndorff1982normal} and \PG{} mixtures are
\begin{align}
\frac{\alpha^2 - \kappa^2}{2\alpha} e^{-\alpha|x-\mu| + \kappa (x-\mu)} &= \int_0^{\infty} \phi(x \mid \mu + \kappa \lambda, \lambda) p_{\mathrm{GIG}}\left\{ \lambda \mid 1,0, (\alpha^2 - \kappa^2)^{1/2}\right\} d\lambda, \; \alpha \geq \kappa \geq 0, \label{eq:GIG}\\
\frac{1}{B(\alpha,\kappa)} \frac{e^{\alpha (x-\mu)}}{(1+e^{x-\mu})^{\alpha + \kappa}}&= \int_0^{\infty} \phi(x \mid \mu + \kappa \lambda, \lambda)p_{\mathrm{Polya}}(\lambda \mid \alpha,\kappa) d\lambda\;, \label{eq:polya}
\end{align}
where $\phi(\mu + \kappa \lambda, \lambda)$ denotes the normal density function with mean $(\mu + \kappa \lambda)$ and variance $\lambda$. The functions $p_{\mathrm{GIG}}$ and $p_{\mathrm{Polya}}$ are the corresponding local mixture densities for the generalized inverse Gaussian and the \PG{}, respectively. The logit and quantile identities can be derived using \CS{} identity.
\end{proposition}
\begin{proof}
Let $f(x) = e^{-x^2/2}$, $a = \alpha$ and $b = |x-\phi|$ in \eqref{eq:identity}. Then,
\[
(2/\pi)^{1/2} \int_{0}^{\infty} \exp\left\{-\half \left(\alpha y - \frac{|x-\mu|}{y} \right)^2 \right\} dy = \frac{1}{\alpha} (2\pi)^{-1/2} \int_0^{\infty} e^{-\half y^2} dy
= \frac{1}{\alpha} \;.
\]
Let $\nu = y^2$. Rearranging the constant terms yields
\[
\frac{1}{\alpha} e^{-\alpha|x-\mu|} = \frac{1}{(2 \pi \nu)^{1/2}} \int_{0}^{\infty} \exp\left[-\left\{ \frac{(x-\mu)^2}{2\nu} + \frac{\alpha^2}{2} \nu \right\} \right]
d\nu \;.
\]
Multiplying by $2^{-1}(\alpha^2-\kappa^2) e^{\kappa(x-\mu)}$ and completing the square yields
\begin{equation*}
\frac{\alpha^2-\kappa^2}{2\alpha} \exp\left\{-\alpha|x-\mu| + \kappa(x-\mu)\right\}
= \int_0^{\infty} \phi(x \mid \mu + \kappa \nu, \nu)
\frac{\alpha^2-\kappa^2}{2} \exp\left(-\frac{\alpha^2-\kappa^2}{2} \nu \right) d \nu.
\end{equation*}
The mixing distribution is exponential with rate parameter $(\alpha^2-\kappa^2)/2$, a special case of the generalized inverse Gaussian distribution introduced by Etienne Halphen circa 1941 \citep{seshadri1997halphen}. The density with parameters $(\lambda, \delta, \gamma)$ has the form
\begin{equation*}
p_{\mathrm{GIG}}(x \mid \lambda, \delta, \gamma) = \frac{(\gamma/\delta)^{\lambda}}{2 K_{\lambda}(\delta \gamma)} x^{\lambda-1}
\exp\left\{ -\half (\delta^2 x^{-1} + \gamma^2 x )\right\}, \quad x, \lambda, \delta > 0,\; p \in \Re \;,
\end{equation*}
where $K_{\lambda}$ is the modified Bessel function of the second kind. The Liouville formula can be used to show that the above is a valid probability density
function. When $\delta$ or $\gamma$ is zero, the normalizing constant takes the limiting values given by $K_{\lambda}(u) \asymp \Gamma(|\lambda|) 2^{|\lambda|-1} u^{|\lambda|}$ for $\lambda > 0$. If $\delta=0$, the generalized inverse Gaussian is identical to a gamma distribution:
\[
p_{\mathrm{GIG}}(x \mid \lambda, \delta = 0 , \gamma) = \frac{\alpha^{\lambda}}{\Gamma(\lambda)} x^{\lambda-1} \exp(-\alpha x), \quad x > 0,\; \alpha = \gamma^2 / 2.
\]
%\end{proof}
%
%%%%%%%%%%%
%\begin{proof}
%\noindent \textbf{Proof of \eqref{eq:polya}} \\
We now present a simple proof for the \PG{} mixture in \eqref{eq:polya}. First, write $\kappa$ for $a-b/2$:
\begin{equation}
\frac{(e^{\psi})^a}{(1+e^{\psi})^b} = 2^{-b} e^{\kappa \omega}
\int_0^{\infty} e^{-\omega \psi^2/2} p(\omega) d\omega
\;,
\label{eq:pg}
\end{equation}
where $\omega \sim \operatorname{PG}(b,0)$, a \PG{} random variable with density
\[
p(\omega \mid b, 0) = \frac{2^{b-1}}{\Gamma(b)} \sum_{n=0}^{\infty} (-1)^n \frac{\Gamma(n+b)}{\Gamma(n+1)} \frac{2n + b}{(2 \pi)^{1/2} \omega^{3/2}}
\exp\left\{-\frac{(2 n + b)^2}{8 \omega} \right\}.
\]
The logit function corresponds to $a=0,b=1$ in \eqref{eq:pg}. The \CS{} identity yields
\begin{equation}
\frac{1}{1+e^{\psi}} = \half e^{- \psi/2} \int_0^{\infty} e^{-(\psi^2\omega)/2} p(\omega) d\omega, \; \mbox{where} \; p(\omega) = \sum_{n=0}^{\infty} (-1)^n \frac{2n+1}{ (2 \pi \omega^3)^{1/2}} e^{-(2n+1)^2/(8 \omega)}
\label{eq:logit}\;.
\end{equation}
To show \eqref{eq:logit}, write the right-hand side interchanging the integral and summation:
\begin{align*}
%I & = \half e^{-\psi/2} \int_0^{\infty} e^{-\frac{\psi^2}{2} \omega}
%\sum_{n=0}^{\infty} (-1)^n \frac{2n+1}{\sqrt{(2\pi)} \omega^{3/2}}
%e^{-\frac{(2n+1)^2}{8\omega}}d \omega \\
I & = \half e^{-\psi/2} \sum_{n=0}^{\infty} (-1)^n \frac{2n+1}{(2 \pi)^{1/2}} \int_0^{\infty} \exp\left[-\left\{ \frac{\psi^2}{2} \omega + \frac{(2n+1)^2}{8 \omega} \right\} \right] \frac{1}{\omega^{3/2}} d\omega \;.
\end{align*}
Using the change of variable $\omega = t^{-2}$ gives
\begin{align*}
% I & = e^{-\psi/2} \sum_{n=0}^{\infty} (-1)^n \frac{(2n+1)}{\sqrt{(2\pi)}}
% \int_{0}^{\infty} e^{-\left( \frac{\psi^2}{2t^2} + \frac{(2n+1)^2 t^2}{8}
% \right)} d t \\
I & = \sum_{n=0}^{\infty} (-1)^n e^{-(n+1)\psi}
\frac{2n + 1}{(2 \pi)^{1/2}}
\left( \int_{0}^{\infty}
\exp\left[-\half \left\{ \frac{(2n+1)t}{2} - \frac{\psi}{t}\right\}^2 \right] dt
\right)
\;.
\end{align*}
Applying the \CS{} identity to the inner integral yields
\[
\int_{0}^{\infty}
\exp\left[-\half \left\{ \frac{(2n+1)t}{2} - \frac{\psi}{t}\right \}^2 \right] dt
= \int_0^{\infty} \frac{e^{-y^2/2}}{2n+1} dy= \frac{(2\pi)^{1/2}}{2n+1}
\;,
\]
which implies $I = \sum_{n=0}^{\infty} (-1)^n \exp\{-(n+1) \psi\} = \{1+\exp(\psi)\}^{-1}$.
%An alternative proof using Laplace transformation is provided in
%\cite{polson2013bayesian}.
\end{proof}
\begin{remark}
When $\alpha = \kappa$, we have the limiting result $(\alpha^2-\kappa^2)^{-1} p_{\mathrm{GIG}}\{1,0, (\alpha^2-\kappa^2)^{1/2} \} = 1,$
or equivalently in terms of densities, with a marginal improper uniform prior, $p(\lambda) = 1$,
\begin{equation}
\int_{0}^{\infty} \phi(b \mid -a\lambda, c\lambda) d\lambda = a^{-1} \exp\left\{-2 \max(ab/c,0)\right\}
\;.
\label{eq:svm}
\end{equation}
This pseudo-likelihood represents support vector machines as a global-local mixture. The identity for quantile regression, which is a limiting case of the above identities by applying Fatou-Lebesgue theorem, is the following:
\[
c^{-1}\exp\{ 2c^{-1} \rho_q(b) \}= \int_{0}^{\infty} \phi( b \mid \lambda - 2\tau \lambda, c \lambda){\rm e}^{-2\tau(1-\tau)\lambda} d\lambda, \quad c, \tau > 0,
\]
where $\rho_q(b) = \rvert b \lvert / 2 + (q-1/2) b$ is the check-loss function \citep{polson_data_2013}.
\end{remark}
\citet{polson_data_2011} derive this as a direct consequence of the Lasso identity
\[
\int_0^{\infty} p/(2 \pi \lambda)^{1/2} \exp\left\{-\left(p^2 \lambda+q^2 \lambda^{-1}\right)/2\right\} d\lambda = e^{-\lvert pq \rvert}.
\]
Applying the Liouville identity yields
\[
\int_{0}^{\infty} f\left(ax + \frac{b}{x} \right) x^{-1/2} dx = a^{-1/2} \int_{0}^{\infty} f\left\{ 2 (ab)^{1/2} + y \right\} y^{-1/2} dy, \quad a, b > 0.
\]
Setting $f(x) = e^{-x}$, $a = p^2/2$, and $b = q^2/2$ we get
%\[
\begin{align*}
\int_0^{\infty} \frac{e^{-(p^2 \lambda + q^2 \lambda^{-1})/2}}{\lambda^{1/2}} d\lambda
& = \frac{2^{1/2}}{p} \int_0^{\infty} e^{-|pq| + y} y^{-1/2} d y \\
& = \frac{2^{1/2} e^{-|pq|}}{p} \int_0^{\infty} e^{-y} y^{-1/2} d y
= \frac{(2\pi)^{1/2} e^{-|pq|}}{p}
\;.
\end{align*}
%\]
\citet{hans2011comment} shows that the elastic-net regression can be recast as a
global-local mixture with a mixing density belonging to the orthant-normal
family of distributions. The orthant-normal prior on a single regression
coefficient, $\beta$, given hyper-parameters $\lambda_1$ and $\lambda_2$,
has a density function with the following form:
\begin{equation}
p(\beta \mid \lambda_1, \lambda_2) =
\begin{cases}
\phi(\beta \mid \frac{\lambda_1}{2\lambda_2}, \frac{\sigma^2}{\lambda_2})
/ 2\Phi\left(-\frac{\lambda_1}{2\sigma \lambda_2^{1/2} }\right), & \quad \beta < 0,
\\
\phi(\beta \mid \frac{-\lambda_1}{2\lambda_2}, \frac{\sigma^2}{\lambda_2}) /
2\Phi\left(-\frac{\lambda_1}{2\sigma \lambda_2^{1/2} }\right), & \quad \beta \geq 0.
\end{cases}
\;
\label{eq:hans}
\end{equation}
\section{Convolution mixtures}
\label{sec:convolutions}
Another interesting area of application is convolution mixtures and marginal densities for location-scale mixture problems. We show that the Cauchy
convolution \citep{pillai2015unexpected} and inverse-gamma convolution can be derived similarly \citep{polson_halfcauchy_2012}. \citet{bhadra_default_2016} show that the regularly varying tails of half-Cauchy priors work well for low-dimensional functions of normal vector mean, where flat priors give poorly calibrated inference.
%% Inverse-Gamma appears in \citep{arnold2009some}.
\begin{lemma}
Let $X_i \sim \CauchyRV(0,1)$ $(i = 1, 2)$ be Cauchy distributed random variates, then $Z = w_1 X_1 + w_2 X_2 \sim \CauchyRV( 0, w_1 + w_2).$ where $w_1,w_2 > 0$.
\end{lemma}
\begin{lemma}
Let $X_i \sim \InvGaussRV(\alpha t_i, \alpha t_i^2)$ $(i = 1, 2)$, then $Z = X_1 + X_2 \sim \InvGaussRV\{\alpha (t_1 + t_2), \alpha (t_1^2+t_2^2)\},$ where $\alpha, t_1, t_2 \geq 0$, and $\InvGaussRV(\alpha t, \alpha t^2)$ is an inverse-Gaussian random variable with density
\[
f(x) = \frac{t \alpha^{1/2} e^t}{(2 \pi)^{1/2} x^{3/2}}
\exp\left( -\frac{\alpha t^2}{2x} - \frac{x}{2\alpha} \right), \quad x \geq 0.
\]
\end{lemma}
Both of these results follow from straightforward applications of the \CS{}
transformation. We give a proof for the Cauchy convolution identity below.
\begin{proof}
Exploiting symmetry and the Lagrange identity $(a^2 + b^2)(c^2 + d^2) = (ac+bd)^2 + (ad-bc)^2,$ leads to the convolution density
\begin{align*}
f_Z(z) &= 2 \int_{0}^{\infty}
\frac{1}{ \pi w_1 (1+ x^2/w_1^2)} \frac{1}{\pi w_2 \{1+ (z-x)^2 / w_2^2 \} } dx
\\
& = \frac{2}{\pi^2 w_1 w_2} \int_{0}^{\infty}
\frac{1}{\{1+ w_1^{-1} w_2^{-1} x (z-x) \}^2 + \{w_2^{-1}z - (w_1^{-1}+ w_2^{-1}) x \}^2 } dx.
\end{align*}
Transforming $x$ to $x + w_2^{-1}z (w_1^{-1} + w_2^{-1})^{-1}$ and letting $a = 1 + z^2(w_1+w_2)^{-2}$, $b =(w_1 w_2)^{-1}$, $c = z (w_2-w_1) \{(w_1+w_2) w_1 w_2\}^{-1}$, $d = z (w_2-w_1)\{(w_1+w_2) w_1 w_2\}^{-1}$ gives
\begin{align*}
f_Z(z) &= \frac{2}{\pi^2 w_1 w_2} \int_{0}^{\infty}
\left[
\left\{ 1 + \frac{z^2}{(w_1+w_2)^2} - \frac{x^2}{w_1w_2} +
xz \frac{w_2-w_1}{(w_1+w_2) w_1 w_2} \right\}^2 +
x^2 \left(\frac{w_1 + w_2}{w_1w_2} \right)^2
\right]^{-1} dx
\\
&= \frac{2}{\pi^2 w_1 w_2} \int_{0}^{\infty}
\frac{dx}{\left( a - b x^2 + cx \right)^2 + x^2 d^2}
= \frac{2}{\pi^2 w_1 w_2} \int_{0}^{\infty}
\frac{dx/x^2}{\left(a/x - bx + c \right)^2 + d^2 }.
\end{align*}
If we let $y = x^{-1}$ and apply the \CS{} transformation, we arrive at
\[
f_Z(z) = \frac{2}{\pi w_1 w_2} \int_{0}^{\infty} \frac{dy}{2a (y^2 + d^2)}
= \frac{1}{\pi w_1 w_2} \frac{1}{ad}= \frac{1}{\pi (w_1+w_2)} \frac{1}{1+ z^2/(w_1+w_2)^2}.
%& = \frac{1}{\pi w_1 w_2} \frac{1}{1+ z^2/(w_1+w_2)^2} \frac{w_1 w_2}{w_1 + w_2} = \frac{1}{\pi (w_1+w_2)} \frac{1}{1+ z^2/(w_1+w_2)^2}
\]
A simple induction argument proves that the sum of any number of independent
Cauchy random variates is also another Cauchy.
\end{proof}
One can also use the characteristic function of $X \sim \CauchyRV(\mu, \sigma)$, $\psi_X(t) = \exp(it \mu - |t| \sigma^2)$, and the relation $\psi_{X+Y}(t) = \psi_X(t) \psi_Y(t)$ to derive the result in just one step. For $X = \sum_{i=1}^{p} \omega_i C_i$ and $C_i \sim \CauchyRV(0,1)$, when $\sum_{i=1}^{p} \omega_i = 1$ we have
$\phi_X(t) = \exp\left(-\sum_{i=1}^{p}\omega_i |t|\right) = \exp(-|t|) = \phi_C(t),$ where $C \sim \CauchyRV(0, 1)$.
The most general result in this category is due to \cite{pillai2015unexpected},
who they showed the following:
Let $(X_1,\ldots,X_m)$ and $(Y_1, \ldots, Y_m)$ be independent and
identically distributed $\NormRV(0, \Sigma)$ for an arbitrary
positive definite matrix $\Sigma$, then
$Z = \sum_{j=1}^{m} w_j X_j / Y_j \sim \CauchyRV(0, 1)$,
as long as $(w_1, \ldots, w_m)$ is independent of $(X, Y)$,
$w_j \geq 0\ (j = 1, \ldots, m)$ and $\sum_{j=1}^{m} w_j = 1$.
\section{Discussion}
\label{sec:discussion}
The \CS{} and Liouville transformations not only guarantee simple normalizing constants for $f(\cdot)$, they also establish the wide class of unimodal densities as global-local scale mixtures. Global-local scale mixtures that are conditionally Gaussian hold a special place in statistical modeling and can be rapidly fit using an expectation-maximization algorithm, as pointed out by \citet{polson_data_2013}. \citet{palmer_amica:_2011} provide a similar tool for modeling multivariate dependence by writing general non-Gaussian multivariate densities as multivariate Gaussian scale mixtures. Our future goal is to extend the \CS{} transformation to express the wide multivariate Gaussian scale mixture models as global-local mixtures that also facilitate easy computation.
We end our paper with conjectures that two other remarkable identities arise as corollaries of such transformation identities. The first one is a recent result by \cite{zhang2014uniform} that proves a uniform correlation mixture of a bivariate Gaussian density with unit variance is a function of the maximum norm:
\begin{equation}
\int_{-1}^{1} \frac{1}{4 \pi (1-\rho^2)^{1/2} }
\exp\left\{ - \frac{x_1^2 + x_2^2 - 2 \rho x_1 x_2}{2 (1-\rho^2)} \right\} d\rho =
\half \left\{1- \Phi(\vectornorm{x}_{\infty})\right\}
\;,
\label{eq:bivar}
\end{equation}
where $\Phi(\cdot)$ is the standard normal distribution function and $\vectornorm{x}_{\infty} = \max\{ x_1, x_2\}$. The bivariate density on the
right side of \eqref{eq:bivar} was introduced by \citet{bryson1982constructing} as uniform mixtures of a chi random variate with 3 degrees of freedom, but the representation as a uniform correlation mixture is a new find. We make a few remarks connected to the Erdelyi's integral identity, which is key to the proof of the uniform correlation mixture of \eqref{eq:bivar}.
%revise
\begin{lemma}
Erdelyi's identity, defined by
\begin{equation}
\int_{1/2}^{\infty} \frac{e^{-x^2 z}}{4 \pi z (2z-1)^{1/2}} dz = \half \left\{1-\Phi(x)\right\}, \quad x \geq 0, \label{eq:erdelyi}
\end{equation}
follows from the Laplace transformation $(1+u)^{-1} = \int_0^{\infty} \exp\{-v(1+u)\} dv$.
\end{lemma}
\begin{proof}
Apply the transform $u = 2z-1$ to the left hand side of \eqref{eq:erdelyi}, denoted by $I$, to obtain
\[
I = \int_{0}^{\infty} \frac{e^{-x^2/\{2(1+u)\}}}{4 \pi {u}^{1/2} (1+u)} du \;.
\]
Using the Laplace transformation $(1+u)^{-1} = \int_0^{\infty} e^{-v(1+u)} dv$ yields
\begin{align*}
I &= \int_{0}^{\infty} \frac{e^{-x^2/\{2(1+u)\}}}{4 \pi {u}^{1/2}}
\int_0^{\infty} e^{-v(1+u)} dv du
= \int_{v= 0}^{\infty} \int_{u=0}^{\infty}
\frac{e^{-({x^2}/{2} + v)(1+u)}}{4 \pi {u}^{1/2}} dv du
\\
&= \int_{v= 0}^{\infty} \frac{1}{4\pi} e^{-({x^2}/{2} + v)}
\int_{u=0}^{\infty} u^{-1/2} e^{-({x^2}/{2} + v) u} du dv
= \int_{v= 0}^{\infty} \frac{e^{-(x^2 + 2v)/2}}{2 (2\pi)^{1/2}}
\frac{1}{(x^2+ 2v)^{1/2} } dv
%&= \int_{v= 0}^{\infty} \frac{\e^{-(\frac{x^2}{2} + v)}}{4\pi} \frac{\pi^{1/2}}{(\frac{x^2}{2} + v)^{1/2}} dv
%= \int_{v= 0}^{\infty} \frac{1}{2 (2\pi)^{1/2}} e^{-\half(x^2 + 2v)} \frac{1}{(x^2+ 2v)^{1/2}} dv
\;,
\end{align*}
and letting $z^2 = x^2 + 2v$ we get
\begin{equation*}
I = \half \int_{z = |x|}^{\infty} \frac{1}{(2\pi)^{1/2}} e^{-z^2/2} dz
= \half \left\{1 - \Phi(|x|)\right\} \;.
\end{equation*}
%\proofSymbol
% hack for automatic qed symbol
\end{proof}
%%% [[ Commented out as Amdeberhan is not published yet and Biometrika won't let us cite ArXiv articles ~ JD. ]] %%%%
%Erdelyi's identity in \eqref{eq:erdelyi} follows from (9.2) of \cite{amdeberhan_cauchy-schlomilch_2010}:
%\[
%\int_0^{\infty} \frac{e^{-\mu^2 (x^2 + \beta^2)}}{x^2+\beta^2} dx = \frac{\pi}{2 \beta} \{ 1 - \erf(\mu \beta) \},
%\]
%If we let $\beta = 2^{-1/2}$ and $x^2+1/2 = z$, the above identity reduces to \eqref{eq:erdelyi} in $\mu$.
The second candidate is the symmetric stable distribution, defined by its characteristic function $\phi(t) = \exp( -|t|^{\alpha}), 0 < \alpha \leq 2$. It admits a normal scale mixture representation with mixing density as $f(v) = 2^{-1} s_{\alpha/2}(v/2), v > 0$, where $s_{\alpha/2}$ is the positive stable density with index $\alpha / 2$ \citep{gneiting1997normal}. The exponential power density arising as a dual of the symmetric stable density also has a normal scale mixture representation with important application in
Bayesian bridge regression \citep{polson_bayesian_2014}.
\[
e^{-|x|^\alpha} = \int_0^{\infty} e^{-x\eta} g(\eta) d\eta, \quad g(\eta) = \sum_{j=1}^{\infty} (-1)^j \frac{\eta^{-j \alpha-1}}{j! \Gamma(-\alpha j)}
\;,
\]
\citet{polson_bayesian_2014} derive this as a limiting result of the scale-mixture of beta representation for $k$-montone densities and utilizing the complete monotonicity of exponential power density. Regularization, in this case, is an outcome of a normal scale mixture with respect to an $\alpha$-stable random variable. We conjecture that these two results follow from the \CS{} formula \eqref{eq:identity}. Other potential applications include using Liouville formula to recognize and generate global-local mixtures, and to calculate higher-order closed-form moments $E(X^n)$ for random variables $X$ that admit a global-local representation.
\bibliographystyle{plainnat}
\bibliography{glref}
\end{document}