Regression cheat sheet: logistic

Nick Griffiths · Apr 27, 2020

Logistic regression

The data

Individuals have a dichotomous outcome.

See:

Estimation

Firth penalized regression avoids separation and is less biased with small sample sizes.

Conditional logistic regression is used when data is matched on outcomes. Maximum likelihood estimation is conditional on this matching.

See:

Model evaluation

The c-statistic is a measure of discrimination between those with the outcome and those without. It is the area under the receiver operating characteristic curve.

The Hosmer and Lemeshow test can be used to assess calibration. It divides the data into quantiles of predicted probability and tests that the local predicted probability is similar to the true frequency of the outcome.

See:

Interpretation of effects

log(P1P)=α+βX

  • The intercept α is the log odds when all X is zero
  • exp(βi) is the odds ratio for one unit increase in predictor Xi

See:

Extension to multinomial outcomes

Proportional odds

Cumulative odds are used to capture the multinomial outcome, and the parameters have the same effect on both outcome levels.

logit2=log(π21π2)=α1+β1X1+...

logit1=log(π1+π21π1π2)=α2+β1X1+...

To interpret the results:

  • exp(β1) is still an odds ratio (for any outcome level) for changes in X1
  • exp(α2α1) is the constant odds ratio odds2odds1 of the two outcome levels. Hence the name proportional odds.

Multinomial model

The multinomial model is more general because each outcome group has its own intercept and coefficients:

logit1=log(π1π0)=α1+β11X1+...

logit2=log(π2π0)=α2+β21X1+...

These are called generalized logits.