Last update: January 2020.
A generalized linear model fits an exponential family distribution with a linear model. The resulting optimization problem is convex when the natural parameterization is used.
Exponential Family Distributions
We assume the response is generated from a distribution parameterized by
Here
Note that here
Fisher Scoring
The model is fit using maximum likelihood. We take a natural gradient step using the Fisher information in
Notice that by the chain rule, we have the following score and Hessian of the log-likelihood.
The Fisher information matrix with respect to the
Notice this coincides with a Newton-Raphson step.
[1] Nelder, J.A., and Wedderburn, R.W.M. (1972). Generalized Linear Models. Journal of the Royal Statistical Society. Series A (General) 135, 370–384.
Here we list useful results of exponential families.
Result 1.
Result 2. The gradients of the log partition function yield moments of the sufficient statistics. First recall that
Now observe that
Similarly it can be shown that
Result 3. The negative log-likelihood of an exponential family distribution is always convex with respect to the natural parameters. This is because the Hessian is a constant positive semi-definite matrix in this case, coinciding with the variance of the sufficient statistics and with no dependence on observations.