GLM links and distributions

Nick Griffiths · Sep 09, 2020

Generalized Linear Models allows transformations of the predictor terms using a link function \(\eta\):

\[\eta(y) = \beta_0 + \beta_1 x_1 + ... + \beta_n x_n\]

When we use maximum likelihood, \(y = \eta^{-1}(\beta_0 + \beta_1 x_1 ...)\) is part of what is used to calculate the likelihood: \(\ell = P(y_{obs} | y)\). The other piece is the assumed distribution of \(y_{obs}\).

The distribution

When analyzing binary data with a logit link, \(y_{obs}\) is logically given a bernoulli distribution. We assign likelihood \(y\) to all \(y_{obs} = 1\) and \(1 - y\) to all \(y_{obs} = 0\) (\(y\) is a probability, not odds or logit).

When using log-binomial regression, implementations aren’t as robust and the models often fail to converge. So with a log link people often use a poisson distribution. However, the variance of poisson is equal to \(y = \eta^{-1}(x_1, x_2, ...)\), and it overestimates the variance of a bernoulli outcome, which is only \(p(1-p) = y(1 - y)\). So the poisson’s CIs are too wide and the usual solution is to use robust variance estimators instead.

Offsets

When modeling count data, an offset term is used to correct for processes with different rates. For example, we can model the number of hypoglycemic events over time like so:

\[y = \lambda t\]

\[log(y) = \beta_0 + \beta_1 x_1 + ... + log(t)\]

Where \(y\) is the number of events and \(t\) is the time the person was followed. The outcome \(y\) is a count variable which can be modeled with a poisson distribution. With the log link, predicted counts are always positive.

Interpretating parameters

If the outcome is binary, then each \(y_{obs}\) is \(0\) or \(1\) and \(\mathbb{E}(y_{obs})\) is a probability. When interpreting the coefficients, an identity link would correspond with a linear relationship between the probability and the predictor (though it is hardly used). A log link corresponds with a log relative risk: \(\beta = log(y_1) - log(y_2) = \log(y_1 / y_2)\). A logit link corresponds with a log odds ratio.

With a time offset, \(\beta_0 + \beta_1 x_1 + ...\) equals \(log(y/t)\), which is the event rate. So the coefficients in poisson regression are log rate ratios, not risk ratios.