Skip to content

Case-control imbalance vs Ascertainment bias

Definition

Case–control imbalance is a property of a *binary-trait* analysis: the number of cases and controls (or case fraction) is very skewed, which affects test calibration and effective sample size. Ascertainment bias is a broader *selection* problem: who enters the study (or registry) is not representative of the target population, so estimated frequencies, effect sizes, or prevalences can be distorted—often discussed for family-based designs but also relevant to volunteer biobanks and clinic-based cohorts.

How they differ

Case–control imbalance Ascertainment bias
Primary issue Extreme case:control ratio in the analysis dataset. Systematic non-representativeness of who is sampled or phenotyped.
Typical GWAS fix Frequency-aware tests, mixed models, SAIGE, saddlepoint/Firth-style approaches for sparse counts. Study design, sampling frames, inverse weighting, sensitivity analyses, or explicit models for how probands/families were recruited.
Affects Association test statistics, rare-variant behavior, N_eff in case–control meta-analysis. Allele frequencies, heritability estimates, generalization, and sometimes apparent effect sizes—not only the case–control ratio.

Rule of thumb: Imbalance is about the internal ratio of labels in the regression; ascertainment is about who is missing or overrepresented relative to the population you want to generalize to. A study can be balanced yet highly ascertained (e.g. clinic-only cases), or imbalanced yet population-based.

References

  • Zhou W, et al. (2018). Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet.