Case-control imbalance vs Ascertainment bias¶
Case–control imbalance is a property of a *binary-trait* analysis: the number of cases and controls (or case fraction) is very skewed, which affects test calibration and effective sample size. Ascertainment bias is a broader *selection* problem: who enters the study (or registry) is not representative of the target population, so estimated frequencies, effect sizes, or prevalences can be distorted—often discussed for family-based designs but also relevant to volunteer biobanks and clinic-based cohorts.
How they differ¶
| Case–control imbalance | Ascertainment bias | |
|---|---|---|
| Primary issue | Extreme case:control ratio in the analysis dataset. | Systematic non-representativeness of who is sampled or phenotyped. |
| Typical GWAS fix | Frequency-aware tests, mixed models, SAIGE, saddlepoint/Firth-style approaches for sparse counts. | Study design, sampling frames, inverse weighting, sensitivity analyses, or explicit models for how probands/families were recruited. |
| Affects | Association test statistics, rare-variant behavior, N_eff in case–control meta-analysis. | Allele frequencies, heritability estimates, generalization, and sometimes apparent effect sizes—not only the case–control ratio. |
Rule of thumb: Imbalance is about the internal ratio of labels in the regression; ascertainment is about who is missing or overrepresented relative to the population you want to generalize to. A study can be balanced yet highly ascertained (e.g. clinic-only cases), or imbalanced yet population-based.
Related terms¶
References¶
- Zhou W, et al. (2018). Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet.