LD clumping vs LD pruning¶
Terms
Definition
Both procedures reduce LD redundancy among SNPs, but clumping is association-driven: it keeps one index variant per region of correlated signals (often from GWAS p values). Pruning is LD-structure–driven: it walks the genome in windows and drops SNPs that are too correlated with a retained anchor, without using GWAS significance.
Topics
How they differ¶
| LD clumping | LD pruning | |
|---|---|---|
| Primary input | GWAS summary stats plus p values and an LD reference. | Genotype LD only (e.g. PLINK --indep-pairwise). |
| Goal | One (or few) representative lead SNPs per association peak; PRS clumping (e.g. C+T). | A sparse, approximately independent SNP set for PCA, GRM, relatedness. |
| Keeps | Strongest signal per region (by design). | Arbitrary “first” SNP in window under rules—not necessarily the lead GWAS hit. |
Rule of thumb: Use clumping for locus lists and many PRS pipelines; use pruning for population structure and PCA where you must not let LD clusters dominate distance.
Related terms¶
References¶
- PLINK 1.9
--clump: https://www.cog-genomics.org/plink/1.9/postproc#clump - PLINK 1.9 LD pruning: https://www.cog-genomics.org/plink/1.9/ld#indep