Per SNP Heritability
GWASLab provides a function to calculate the variance explained by each SNP (per-SNP heritability or R²), which is useful for Mendelian randomization, polygenic risk scores, and understanding the contribution of individual variants to trait variance.
Available since v3.4.20
.get_per_snp_r2()
Calculates the proportion of variance explained by each SNP and optionally computes F-statistics for instrument strength assessment.
Required Columns
- For quantitative traits (
mode="q"):BETAandEAF(effect allele frequency) - For binary traits (
mode="b"):BETA,EAF, plusncase,ncontrol, andprevalenceparameters - For F-statistics:
N(sample size) column
Parameters
General Parameters
| Parameter | DataType | Description | Default |
|---|---|---|---|
mode |
str |
Trait type: "q" for quantitative, "b" for binary |
"q" |
beta |
str |
Column name for effect size (beta coefficient) | "BETA" |
af |
str |
Column name for effect allele frequency | "EAF" |
n |
str |
Column name for sample size | "N" |
se |
str |
Column name for standard error (used when vary="se") |
"SE" |
adjuested |
bool |
If True, calculate adjusted R² |
False |
verbose |
bool |
Print progress messages | True |
Parameters for Quantitative Traits (mode="q")
| Parameter | DataType | Description | Default |
|---|---|---|---|
vary |
float or "se" |
Variance of the phenotype Y. If "se", Var(Y) is estimated from SE, N, and MAF |
1 |
k |
int or "all" |
Number of parameters for F-statistic calculation. Use "all" to set k = number of SNPs |
1 |
Parameters for Binary Traits (mode="b")
| Parameter | DataType | Description | Default |
|---|---|---|---|
ncase |
int |
Number of cases in the study | Required |
ncontrol |
int |
Number of controls in the study | Required |
prevalence |
float |
Disease prevalence in the general population (0-1) | Required |
Output Columns
The function adds the following columns to the sumstats:
SNPR2: Proportion of variance explained by each SNP (R²)ADJUESTED_SNPR2: Adjusted R² (only ifadjuested=True)F: F-statistic for instrument strength (only if N column is available)_VAR(BETAX): Variance of BETA × X (intermediate calculation for quantitative traits)_SIGMA2: Error variance (intermediate calculation whenvary="se")_POPAF: Population allele frequency (intermediate calculation for binary traits)_VG: Genetic variance (intermediate calculation for binary traits)
Quantitative Traits (mode="q")
For quantitative traits, the variance explained by SNP \(i\) is calculated as:
Where:
- \(Var(\beta_i \times X_i) = 2 \times \beta_i^2 \times **EAF**_i \times (1 - **EAF**_i)\)
-
\(Var(Y)\) depends on the
varyparameter: -
If
varyis a number: \(Var(Y) = \text{vary}\) - If
vary="se": \(Var(Y) = Var(\beta_i \times X_i) + **SE**_i^2 \times 2 \times **N** \times **EAF**_i \times (1 - **EAF**_i)\)
F-statistic (when N is available):
Where \(k\) is the number of parameters (default: 1).
When vary=1 and k=1
The simplified formulas are:
Reference for quantitative traits
Shim, H., Chasman, D. I., Smith, J. D., Mora, S., Ridker, P. M., Nickerson, D. A., ... & Stephens, M. (2015). A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians. PloS one, 10(4), e0120758.
Binary Traits (mode="b")
For binary traits, the function estimates the variance of liability explained by each variant using a liability threshold model.
The calculation involves:
- Estimating population allele frequency from sample allele frequency, case-control ratio, odds ratio, and population prevalence
- Calculating genetic variance: \(V_G = \beta^2 \times p \times (1-p)\) where \(p\) is the population allele frequency
- Calculating R²: \(R^2 = \frac{V_G}{V_G + V_E}\) where \(V_E = \frac{\pi^2}{3} \approx 3.29\) (error variance for logistic regression)
Reference for binary traits
Equation 10 in Lee, S. H., Goddard, M. E., Wray, N. R., & Visscher, P. M. (2012). A better coefficient of determination for genetic profile analysis. Genetic epidemiology, 36(3), 214-224.
Implementation adopted from TwoSampleMR.
Examples
Basic usage for quantitative trait:
import gwaslab as gl
# Load sumstats
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
n=170000)
# Calculate per-SNP R² with default settings (vary=1)
mysumstats.get_per_snp_r2()
# View results
print(mysumstats.data[["SNP", "BETA", "EAF", "SNPR2", "F"]].head())
Quantitative trait with estimated Var(Y):
Quantitative trait with custom Var(Y):
Binary trait:
# For binary traits, provide case/control counts and prevalence
mysumstats.get_per_snp_r2(mode="b",
ncase=8500,
ncontrol=161500,
prevalence=0.10)
With adjusted R²:
Custom k for F-statistic:
# Use k=10 for F-statistic calculation
mysumstats.get_per_snp_r2(k=10)
# Or use all SNPs as k
mysumstats.get_per_snp_r2(k="all")
Use Cases
- Mendelian Randomization: F-statistics > 10 indicate strong instruments
- Polygenic Risk Scores: R² values help assess individual variant contributions
- Power Analysis: Understanding variance explained by each variant
- Quality Control: Identifying variants with unexpectedly high or low R² values