Case-control studies: sampling controls

Nick Griffiths · Feb 14, 2020

Historically, case-control studies were considered inferior to cohort studies because they were more prone to bias and harder to interpret. But everything that makes a cohort study reliable can be used in case-controls too, and choosing a case-control design may still be much more efficient.

Case-control and cohort studies both attempt to estimate an association between an exposure and an outcome. This means that both require a comparison between a group of people that were exposed and a group of people who were not. Then we can see whether the risk or rate of the outcome is different between the two groups.

The difference is in how we calculate these estimates. In a cohort study, we keep track of each group directly. At the beginning of the study, we define the exposed and unexposed groups, and over the study period we see whether they go on to develop the outcome of interest. The results can be represented by a two-by-two table of results.

Exposed Unexposed
Cases 10 12
Total 200 400

From these results, we can just divide the cases by the totals to calculate the risks in the two groups, which we can compare to draw conclusions.

In case-control studies, we don’t use the total row of exposed and unexposed individuals in our calculation. Instead, we choose a method of sampling from the source population, and then we sort people into exposed and unexposed control groups.

Exposed Unexposed
Cases 10 12
Controls 30 60

There are three ways to sample controls. The oldest method is called cumulative sampling, where the controls are sampled from everyone who does not have the disease. When this method is used, the odds ratio of being a case, comparing the exposed group to the unexposed group, is the same as the odds ratio of developing the disease in the cohort study.

Odds ratios are very difficult for clinicians and patients to interpret because while they can demonstrate association, they do not describe the change in risk associated with exposure, which is what people use to make decisions. Because of this, modern case-control studies sample controls in such a way that the odds ratio of getting the disease approximates either the risk ratio or the incidence rate ratio.

To approximate the risk ratio, case-cohort sampling is used. Controls are sampled from all those who were at risk at the beginning of the study, including those who later became a case. The control series is then proportional to the whole cohort, so the odds ratio of being a case will estimate the cohort risk ratio. To approximate the rate ratio, density sampling is used. This is the same as case-cohort sampling but sampling is weighted by person-time contributed to the study.

The use of control sampling provides a number of benefits. First, it reduces the cost of the study. When the ratio of controls to cases is greater than 3 or 4, statistical power no longer increases. This means that studying rare outcomes using cohort designs is a waste of resources. The same statistical power can be achieved with a case-control design using a smaller sample of controls.

Second, case-control studies do not require a formal cohort because the controls are sampled. A study can be initiated just by starting to enroll cases. It is much cheaper and faster than keeping track of everyone in a cohort, which is a huge benefit when information is needed urgently or with little resources. However, case-control studies with a known cohort, called nested case-controls, avoid many of the potential biases associated with non-nested case-control studies because it is much more straightforward to sample controls from a defined group.

A cohort study is not fundamentally different from a case-control study, it is a special case. Cohort studies are equivalent to nested case-controls that use density or case-cohort sampling, but they insist on selecting the whole cohort. As a result, case-controls are a more efficient way to estimate a risk or rate ratio.