LDSC in GWASLab (BETA version)

Available since v3.4.39

LD score regression has become one of the most common methods to evaluate the inflation caused by confounding factors and evaluate the genetic correlation across traits in GWAS.

The original LDSC software was implemented in Python2 and was only available for the command line interface.

GWASLab integrates the core functions of LDSC into the gl.Sumstats object, which makes the LD score regression much more convenient to conduct.

The difference between original LDSC and LDSC in GWASLab

GWASLab will automatically extract Hapmap3 SNPs based on CHR:POS and EA and NEA if rsID not available in sumstats
Codes have been adjusted to be compatible with Python3. (map, xrange and so forth)
Sumstats were supplied by GWASLab instead of reading from files.
Log system has been replaced by GWASLab.Log
Integrated munging workflow based on original LDSC munge_sumstats.py
Automatic column name handling (works with both GWASLab and LDSC standard column names)
Fixed minor errors

LICENSE change

Since LDSC was integrated into GWASLab, LICENSE for GWASLab has also been changed from MIT to GPL-3.0 license to be compatible with LDSC's LICENSE.

Munging (Filtering and Harmonization)

Munging workflow

GWASLab implements the LDSC munging workflow based on the original munge_sumstats.py. Munging applies standard filtering and harmonization procedures to prepare summary statistics for LDSC analysis.

The munging process includes:

Column name mapping: Automatically maps GWASLab column names to LDSC standard format:
**EA** → A1 (effect allele)
**NEA** → A2 (non-effect allele)
**EAF** → FRQ (frequency)
**rsID** → SNP (variant ID)
P-value filtering: Removes SNPs with P-values outside (0, 1] with warnings
INFO score filtering: Filters SNPs with INFO < threshold (default 0.9), warns if INFO outside [0, 1.5]
MAF filtering: Converts EAF to MAF (minor allele frequency) and filters by MAF threshold (default 0.01), warns if frequency outside [0, 1]
Allele filtering: Keeps only strand-unambiguous SNPs (A/T, C/G, A/C, A/G, T/C, T/G)
Palindromic SNP removal: Optionally removes palindromic SNPs (default: True)
Sample size filtering: Filters by N using 90th percentile / 1.5 threshold (LDSC default) or user-specified value
P to Z conversion: Creates Z-scores from P-values (prefers BETA/SE if available for more accurate conversion)
Duplicate removal: Removes duplicate SNPs based on SNP ID
Optional exclusions: Can exclude HLA region and sex chromosomes

Munging can be enabled for estimate_h2_by_ldsc() using the munge=True parameter. All LDSC estimation functions are compatible with pre-munged data (they automatically handle both munged and non-munged column formats).

Munging parameter	DataType	Description	Default
`munge`	`bool`	If `True`, apply munging procedures (available for `estimate_h2_by_ldsc()`)	`False`
`munge_kwargs`	`dict`	Additional munging parameters	`None`

Munging compatibility

All LDSC estimation functions (estimate_h2_by_ldsc, estimate_rg_by_ldsc, estimate_h2_cts_by_ldsc, estimate_partitioned_h2_by_ldsc) can work with pre-munged data. The functions automatically detect and handle both GWASLab column names (EA/NEA/rsID) and LDSC standard names (A1/A2/SNP).

Munging parameters

When munge=True, you can customize munging behavior using munge_kwargs:

Parameter	DataType	Description	Default
`info`	`float`	Minimum INFO score threshold	`0.9`
`maf`	`float`	Minimum minor allele frequency threshold	`0.01`
`n`	`float` or `None`	Minimum sample size. If `None`, uses 90th percentile / 1.5	`None`
`nopalindromic`	`bool`	If `True`, remove palindromic SNPs	`True`
`exclude_hla`	`bool`	If `True`, exclude HLA region	`True`
`exclude_sexchr`	`bool`	If `True`, exclude sex chromosomes	`True`

Single variate LD score regression

Bulik-Sullivan, et al. LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies. Nature Genetics, 2015.

mysumstats.estimate_h2_by_ldsc(build=None, verbose=True, match_allele=True, how="right", **kwargs)

`.estimate_h2_by_ldsc()` options	DataType	Description	Default
`ref_ld_chr`	`string`	Required. LD score reference file directory (e.g., `"/path/to/ldscores/"`)	-
`w_ld_chr`	`string`	LD score weight reference file directory. Often the same as `ref_ld_chr`	-
`build`	`str`	Genome build version (e.g., `"19"`, `"38"`). If `None`, uses the build from sumstats metadata	`None`
`verbose`	`bool`	If `True`, print detailed progress messages	`True`
`match_allele`	`bool`	If `True`, match alleles with reference panel	`True`
`how`	`str`	Merge strategy for allele matching: `"left"`, `"right"`, `"inner"`, `"outer"`	`"right"`
`munge`	`bool`	If `True`, apply standard munging procedures (filtering, harmonization, QC)	`False`
`munge_kwargs`	`dict`	Additional parameters for munging (e.g., `info=0.9`, `maf=0.01`)	`None`
`samp_prev`	`float`	Sample prevalence (case proportion) for case-control studies	Auto from metadata
`pop_prev`	`float`	Population prevalence for case-control studies	Auto from metadata
`print_coefficients`	`str`	Print coefficient results. Set to `"ldsc"` to enable	`"ldsc"`

Results (a pd.DataFrame) will be stored in .ldsc_h2. If print_coefficients is enabled, coefficient results will be stored in .ldsc_h2_results.

Basic heritability estimation

mysumstats.basic_check()
mysumstats.estimate_h2_by_ldsc(ref_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/", 
                               w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/")
mysumstats.ldsc_h2

With munging and additional options

mysumstats.estimate_h2_by_ldsc(ref_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/", 
                               w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/",
                               munge=True,
                               munge_kwargs={"info": 0.9, "maf": 0.01, "exclude_hla": True},
                               print_coefficients="ldsc")
# Access results
mysumstats.ldsc_h2  # Summary results
mysumstats.ldsc_h2_results  # Coefficient results (if print_coefficients is enabled)

Custom munging parameters

# Customize munging with stricter filters
mysumstats.estimate_h2_by_ldsc(ref_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/", 
                               w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/",
                               munge=True,
                               munge_kwargs={
                                   "info": 0.95,        # Stricter INFO filter
                                   "maf": 0.05,         # Higher MAF threshold
                                   "n": 5000,           # Minimum sample size
                                   "nopalindromic": True,
                                   "exclude_hla": True,
                                   "exclude_sexchr": True
                               })
mysumstats.ldsc_h2

Case-control study with prevalence

mysumstats.estimate_h2_by_ldsc(ref_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/", 
                               w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/",
                               samp_prev=0.5,  # 50% cases in sample
                               pop_prev=0.1)  # 10% prevalence in population
mysumstats.ldsc_h2

For more examples, see LDSC in gwaslab

Cross-trait LD score regression

Bulik-Sullivan, B., et al. An Atlas of Genetic Correlations across Human Diseases and Traits. Nature Genetics, 2015.

mysumstats.estimate_rg_by_ldsc(build=None, verbose=True, match_allele=True, how="right", get_hm3=True, **kwargs)

`.estimate_rg_by_ldsc()` options	DataType	Description	Default
`other_traits`	`list`	Required. A list of `gl.Sumstats` objects for other traits to compare	-
`ref_ld_chr`	`string`	Required. LD score reference file directory (e.g., `"/path/to/ldscores/"`)	-
`w_ld_chr`	`string`	LD score weight reference file directory. Often the same as `ref_ld_chr`	-
`rg`	`string`	Alias for each trait separated by commas (e.g., `"T2D,BMI_female,BMI_male"`). If not provided, uses study names from metadata	Auto
`build`	`str`	Genome build version (e.g., `"19"`, `"38"`). If `None`, uses the build from sumstats metadata	`None`
`verbose`	`bool`	If `True`, print detailed progress messages	`True`
`match_allele`	`bool`	If `True`, match alleles with reference panel	`True`
`how`	`str`	Merge strategy for allele matching: `"left"`, `"right"`, `"inner"`, `"outer"`	`"right"`
`get_hm3`	`bool`	If `True`, filter to HapMap3 SNPs before analysis	`True`
`samp_prev`	`string`	Sample prevalences separated by commas (e.g., `"0.5,0.3,0.4"`). Auto-detected from metadata if available	Auto
`pop_prev`	`string`	Population prevalences separated by commas (e.g., `"0.1,0.2,0.15"`). Auto-detected from metadata if available	Auto

Results (a pd.DataFrame) will be stored in .ldsc_rg.

Basic genetic correlation

# Load other traits as Sumstats objects
bmi_female = gl.Sumstats("bmi_female.txt.gz", fmt="gwaslab")
bmi_male = gl.Sumstats("bmi_male.txt.gz", fmt="gwaslab")

mysumstats.estimate_rg_by_ldsc(other_traits=[bmi_female, bmi_male], 
                               ref_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/", 
                               w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/")
mysumstats.ldsc_rg

With custom trait aliases

mysumstats.estimate_rg_by_ldsc(other_traits=[bmi_female, bmi_male], 
                               rg="T2D,BMI_female,BMI_male",  # Custom aliases
                               ref_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/", 
                               w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/")
mysumstats.ldsc_rg

Case-control studies with prevalence

mysumstats.estimate_rg_by_ldsc(other_traits=[bmi_female, bmi_male], 
                               ref_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/", 
                               w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/",
                               samp_prev="0.5,0.3,0.4",  # Sample prevalences for each trait
                               pop_prev="0.1,0.2,0.15")  # Population prevalences for each trait
mysumstats.ldsc_rg

Without HapMap3 filtering

mysumstats.estimate_rg_by_ldsc(other_traits=[bmi_female, bmi_male], 
                               ref_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/", 
                               w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/",
                               get_hm3=False)  # Use all SNPs, not just HapMap3
mysumstats.ldsc_rg

For more examples, see LDSC in gwaslab

Cell type specific heritability

Finucane, H. K., Reshef, Y. A., Anttila, V., Slowikowski, K., Gusev, A., Byrnes, A., ... & Price, A. L. (2018). Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nature genetics, 50(4), 621-629.

mysumstats.estimate_h2_cts_by_ldsc(build=None, verbose=True, match_allele=True, how="right", **kwargs)

`.estimate_h2_cts_by_ldsc()` options	DataType	Description	Default
`ref_ld_chr_cts`	`string`	Required. LD score reference file directory for cell type specific analysis (e.g., `"/path/to/baseline/baseline."`)	-
`build`	`str`	Genome build version (e.g., `"19"`, `"38"`). If `None`, uses the build from sumstats metadata	`None`
`verbose`	`bool`	If `True`, print detailed progress messages	`True`
`match_allele`	`bool`	If `True`, match alleles with reference panel	`True`
`how`	`str`	Merge strategy for allele matching: `"left"`, `"right"`, `"inner"`, `"outer"`	`"right"`
`print_all_cts`	`bool`	If `True`, print all cell type specific results	`False`

Results (a pd.DataFrame) will be stored in .ldsc_h2_cts.

Cell type specific heritability

mysumstats.estimate_h2_cts_by_ldsc(ref_ld_chr_cts="/home/yunye/tools/ldsc/eas_baseline/baseline1_2/baseline.")
mysumstats.ldsc_h2_cts

Partitioned heritability

Bulik-Sullivan, et al. LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies. Nature Genetics, 2015.

mysumstats.estimate_partitioned_h2_by_ldsc(build=None, verbose=True, match_allele=True, how="right", **kwargs)

`.estimate_partitioned_h2_by_ldsc()` options	DataType	Description	Default
`ref_ld_chr`	`string`	Required. LD score reference file directory with annotations (e.g., `"/path/to/annotations/"`)	-
`w_ld_chr`	`string`	LD score weight reference file directory. Often the same as `ref_ld_chr`	-
`build`	`str`	Genome build version (e.g., `"19"`, `"38"`). If `None`, uses the build from sumstats metadata	`None`
`verbose`	`bool`	If `True`, print detailed progress messages	`True`
`match_allele`	`bool`	If `True`, match alleles with reference panel	`True`
`how`	`str`	Merge strategy for allele matching: `"left"`, `"right"`, `"inner"`, `"outer"`	`"right"`
`samp_prev`	`float`	Sample prevalence (case proportion) for case-control studies	Auto from metadata
`pop_prev`	`float`	Population prevalence for case-control studies	Auto from metadata

Results will be stored in .ldsc_partitioned_h2_summary and .ldsc_partitioned_h2_results.

Partitioned heritability

mysumstats.estimate_partitioned_h2_by_ldsc(ref_ld_chr="/home/yunye/tools/ldsc/annotations/", 
                                            w_ld_chr="/home/yunye/tools/ldsc/ldscores/eas_ldscores/")
mysumstats.ldsc_partitioned_h2_summary  # Summary results
mysumstats.ldsc_partitioned_h2_results   # Detailed results by annotation