Case-control imbalance vs Ascertainment bias¶

Terms

Definition

Case–control imbalance is a property of a *binary-trait* analysis: the number of cases and controls (or case fraction) is very skewed, which affects test calibration and effective sample size. Ascertainment bias is a broader *selection* problem: who enters the study (or registry) is not representative of the target population, so estimated frequencies, effect sizes, or prevalences can be distorted—often discussed for family-based designs but also relevant to volunteer biobanks and clinic-based cohorts.

Topics

Epidemiology GWAS

How they differ¶

	Case–control imbalance	Ascertainment bias
Primary issue	Extreme case:control ratio in the analysis dataset.	Systematic non-representativeness of who is sampled or phenotyped.
Typical GWAS fix	Frequency-aware tests, mixed models, SAIGE, saddlepoint/Firth-style approaches for sparse counts.	Study design, sampling frames, inverse weighting, sensitivity analyses, or explicit models for how probands/families were recruited.
Affects	Association test statistics, rare-variant behavior, N_eff in case–control meta-analysis.	Allele frequencies, heritability estimates, generalization, and sometimes apparent effect sizes—not only the case–control ratio.

Rule of thumb: Imbalance is about the internal ratio of labels in the regression; ascertainment is about who is missing or overrepresented relative to the population you want to generalize to. A study can be balanced yet highly ascertained (e.g. clinic-only cases), or imbalanced yet population-based.

References¶

Zhou W, et al. (2018). Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet.

Case-control imbalance vs Ascertainment bias¶

How they differ¶

Related terms¶

References¶