Extract Lead Variants
GWASLab can extract the lead variants based on MLOG10P values or P values (by default, GWASLab will use MLOG10P first since v3.4.37) from identified significant loci using a sliding window, and return the result as a pandas.DataFrame or gl.Sumstats Object.
.get_lead()
mysumstats.get_lead(
scaled=False,
use_p=False,
windowsizekb=500,
sig_level=5e-8,
anno=False,
build="19",
source="ensembl",
gtf_path=None,
wc_correction=False,
verbose=True,
gls=False
)
Parameters
| Parameter | DataType | Description | Default |
|---|---|---|---|
scaled |
boolean |
(deprecated since v3.4.37) If True, use MLOG10P for extraction instead of P values | False |
use_p |
boolean |
(available since v3.4.37) If True, use P for extraction instead of MLOG10P values | False |
windowsizekb |
int |
Specify the sliding window size in kb | 500 |
sig_level |
float |
Specify the P value threshold | 5e-8 |
anno |
boolean |
If True, annotate the lead variants with nearest gene names | False |
source |
"ensembl" or "refseq" |
When anno=True, annotate variants using GTF files from ensembl or refseq |
"ensembl" |
gtf_path |
string |
Path to custom GTF file for annotation. If provided, overrides source |
None |
build |
"19" or "38" |
Genome build version "19" (GRCh37/hg19) or "38" (GRCh38/hg38) | "19" |
wc_correction |
boolean |
If True, apply Winner's Curse correction to effect sizes | False |
verbose |
boolean |
If True, print logs | True |
gls |
boolean |
If True, return a new gl.Sumstats Object instead of pandas DataFrame | False |
Lead Variant Extraction
.get_lead() simply extracts the lead variants of each significant loci. It is different from clumping.
Prerequisites
Please try running .basic_check() to standardize the sumstats if there are any errors.
GBMI Definition
GWASLab adopted the definition for novel loci from Global Biobank Meta-analysis Initiative flagship paper.
"We defined genome-wide significant loci by iteratively spanning the ±500 kb region around the most significant variant and merging overlapping regions until no genome-wide significant variants were detected within ±1 Mb."
(Details are described in Zhou, W., Kanai, M., Wu, K. H. H., Rasheed, H., Tsuo, K., Hirbo, J. B., ... & Study, C. O. H. (2022). Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell Genomics, 2(10), 100192.)
GWASLab currently iteratively extends ± windowsizekb kb region around the most significant variant and merges overlapping regions until no genome-wide significant variants were detected within ± windowsizekb. (slightly different from the GBMI paper. When windowsizekb=1000, it is equivalent to GBMI's definition.)
Examples
Basic lead variant extraction
Extract lead variants with gene annotation
Extract lead variants with Winner's Curse correction
Return as Sumstats object
Using custom GTF file
Notes
- By default, GWASLab uses MLOG10P for extraction (since v3.4.37). Use
use_p=Trueto use P values instead. - The
scaledparameter is deprecated but still supported for backward compatibility. - Gene annotation requires internet connection to download GTF files (unless
gtf_pathis provided). - Winner's Curse correction adjusts effect sizes for variants that are significant due to sampling variability.