Skip to content

LD clumping vs LD pruning

Definition

Both procedures reduce LD redundancy among SNPs, but clumping is association-driven: it keeps one index variant per region of correlated signals (often from GWAS p values). Pruning is LD-structure–driven: it walks the genome in windows and drops SNPs that are too correlated with a retained anchor, without using GWAS significance.

How they differ

LD clumping LD pruning
Primary input GWAS summary stats plus p values and an LD reference. Genotype LD only (e.g. PLINK --indep-pairwise).
Goal One (or few) representative lead SNPs per association peak; PRS clumping (e.g. C+T). A sparse, approximately independent SNP set for PCA, GRM, relatedness.
Keeps Strongest signal per region (by design). Arbitrary “first” SNP in window under rules—not necessarily the lead GWAS hit.

Rule of thumb: Use clumping for locus lists and many PRS pipelines; use pruning for population structure and PCA where you must not let LD clusters dominate distance.

References

  • PLINK 1.9 --clump: https://www.cog-genomics.org/plink/1.9/postproc#clump
  • PLINK 1.9 LD pruning: https://www.cog-genomics.org/plink/1.9/ld#indep