LD clumping vs LD pruning¶

Terms

Definition

Both procedures reduce LD redundancy among SNPs, but clumping is association-driven: it keeps one index variant per region of correlated signals (often from GWAS p values). Pruning is LD-structure–driven: it walks the genome in windows and drops SNPs that are too correlated with a retained anchor, without using GWAS significance.

Topics

Population genetics GWAS

How they differ¶

	LD clumping	LD pruning
Primary input	GWAS summary stats plus p values and an LD reference.	Genotype LD only (e.g. PLINK `--indep-pairwise`).
Goal	One (or few) representative lead SNPs per association peak; PRS clumping (e.g. C+T).	A sparse, approximately independent SNP set for PCA, GRM, relatedness.
Keeps	Strongest signal per region (by design).	Arbitrary “first” SNP in window under rules—not necessarily the lead GWAS hit.

Rule of thumb: Use clumping for locus lists and many PRS pipelines; use pruning for population structure and PCA where you must not let LD clusters dominate distance.

References¶

PLINK 1.9 --clump: https://www.cog-genomics.org/plink/1.9/postproc#clump
PLINK 1.9 LD pruning: https://www.cog-genomics.org/plink/1.9/ld#indep

LD clumping vs LD pruning¶

How they differ¶

Related terms¶

References¶