Skip to content

Extract lead variants

GWASLab can extract the lead variants based on MLOG10P values or P values (By default, GWASLab will use MLOG10P first since v3.4.37) from identified significant loci using a sliding window, and return the result as a pandas.DataFrame or gl.Sumstats Object.

.get_lead()

mysumstats.get_lead(
           scaled=False,
           use_p=False,
           windowsizekb=500,
           sig_level=5e-8,
           anno=False,
           build="19",
           source="ensembl",
           verbose=True,
           gls=False)

Options

.get_lead() options DataType Description Default
scaled boolean (deprecated since v3.4.37) If True, use MLOG10P for extraction instead of P values False
use_p boolean (available since v3.4.37)If True, use P for extraction instead of MLOG10P values False
windowsizekb int Specify the sliding window size in kb 500
sig_level float Specify the P value threshold 5e-8
anno boolean If True, annotate the lead variants with nearest gene names. False
source ensembl or refseq When anno=True, annotate variants using gtf files from ensembl or refseq ensembl
build "19" or "38" genome build version "19" or "38". "19"
verbose boolean If True, print logs True
gls boolean If True, return a new gl.Sumstats Object instead of pandas DataFrame False

Note

.get_lead() simply extracts the lead variants of each significant loci. It is different from clumping.

Note

Please try running .basic_check() to standardize the sumstats if there are any errors.

Quote

GWASLab adopted the definition for novel loci from Global Biobank Meta-analysis Initiative flagship paper.

"We defined genome-wide significant loci by iteratively spanning the ±500 kb region around the most significant variant and merging overlapping regions until no genome-wide significant variants were detected within ±1 Mb."

(Details are described in Zhou, W., Kanai, M., Wu, K. H. H., Rasheed, H., Tsuo, K., Hirbo, J. B., ... & Study, C. O. H. (2022). Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell Genomics, 2(10), 100192. )

GWASlab currently iteratively extends ± windowsizekb kb region around the most significant variant and merges overlapping regions until no genome-wide significant variants were detected within ± windowsizekb. (slightly different from the GBMI paper. When windowsizekb=1000, it is equivalent to GBMI's definition.)

Example