Mendelian randomization
Mendelian randomization introduction
Comparison between RCT and MR
Fundamental assumption: gene-environment equivalence
(cited from George Davey Smith Mendelian Randomization - 25th April 2024)
The fundamental assumption of mendelian randomization (MR) is of gene-environment equivalence. MR reflects the phenocopy/ genocopy dialectic (Goldschmidt, Schmalhausen). The idea here is that all environmental effects can be mimicked by one or several mutations. (Zuckerkandl and Villet, PNAS 1988)
Gene-environment equivalence
- Requires justifying in all situations
- Relates to biological processes that are influenced by genetic variations
If we consider BMI as the outcome, let's think about whether genetic variants related to the following exposures meet the gene-environment equivalence assumption:
- Higher calorie intake: Yes
- Physical activity level: Yes
- Losing a leg (which dramatically affects BMI): No
- Smoking:? Maybe. Complicated.
Methods: Instrumental Variables (IV)
Instrumental variable (IV) can be defined as a variable that is correlated with the exposure X and uncorrelated with the error \(\epsilon\) in the following regression:
- \(Y\) is the outcome
- \(X\) is the exposure
- \(C\) is the confounders
IV Assumptions
Key Assumptions
Assumptions | Description |
---|---|
Relevance | Instrumental variables are strongly associated with the exposure.(IVs are not independent of X) |
Exclusion restriction | Instrumental variables do not affect the outcome except through the exposure.(IV is independent of Y, conditional on X and C) |
Independence | There are no confounders of the instrumental variables and the outcome.(IV is independent of C) |
Monotonicity | Variants affect the exposure in the same direction for all individuals |
No assortative mating | Assortative mating might cause bias in MR |
Two-stage least-squares (2SLS)
Two-sample MR
Two-sample MR refers to the approach that the genetic effects of the instruments on the exposure can be estimated in an independent sample other than that used to estimate effects between instruments on the outcome. As more and more GWAS summary statistics become publicly available, the scope of MR also expands with Two-sample MR methods.
Caveats
For two-sample MR, there is an additional key assumption:
The two samples used for MR are from the same underlying populations. (The effect size of instruments on exposure should be the same in both samples.)
Therefore, for two-sample MR, we usually use datasets from similar non-overlapping populations in terms of not only ancestry but also contextual factors.
IV selection
One of the first things to do when you plan to perform any type of MR is to check the associations of instrumental variables with the exposure to avoid bias caused by weak IVs.
The most commonly used method here is the F-statistic, which tests the association of instrumental variables with the exposure.
Practice
In this tutorial, we will walk you through how to perform a minimal TwoSampleMR analysis. We will use the R package TwoSampleMR, which provides easy-to-use functions for formatting, clumping and harmonizing GWAS summary statistics.
This package integrates a variety of commonly used MR methods for analysis, including:
> mr_method_list()
obj
1 mr_wald_ratio
2 mr_two_sample_ml
3 mr_egger_regression
4 mr_egger_regression_bootstrap
5 mr_simple_median
6 mr_weighted_median
7 mr_penalised_weighted_median
8 mr_ivw
9 mr_ivw_radial
10 mr_ivw_mre
11 mr_ivw_fe
12 mr_simple_mode
13 mr_weighted_mode
14 mr_weighted_mode_nome
15 mr_simple_mode_nome
16 mr_raps
17 mr_sign
18 mr_uwr
name PubmedID
1 Wald ratio
2 Maximum likelihood
3 MR Egger 26050253
4 MR Egger (bootstrap) 26050253
5 Simple median
6 Weighted median
7 Penalised weighted median
8 Inverse variance weighted
9 IVW radial
10 Inverse variance weighted (multiplicative random effects)
11 Inverse variance weighted (fixed effects)
12 Simple mode
13 Weighted mode
14 Weighted mode (NOME)
15 Simple mode (NOME)
16 Robust adjusted profile score (RAPS)
17 Sign concordance test
18 Unweighted regression
Inverse variance weighted (fixed effects)
Assumption: the underlying 'true' effect is fixed across variants
Weight for the effect of ith variant:
Effect size:
SE:
File Preparation
To perform two-sample MR analysis, we need summary statistics for exposure and outcome generated from independent populations with the same ancestry.
In this tutorial, we will use sumstats from Biobank Japan pheweb and KoGES pheweb.
- Type 2 diabetes sumstats from BBJ :
wget -O bbj_t2d.zip https://pheweb.jp/download/T2D
- BMI sumstats from KoGES :
wget -O koges_bmi.txt.gz https://koges.leelabsg.org/download/KoGES_BMI
R package TwoSampleMR
First, to use TwosampleMR, we need R>= 4.1. To install the package, run:
library(remotes)
install_github("MRCIEU/TwoSampleMR")
Loading package
library(TwoSampleMR)
Reading exposure sumstats
#format exposures dataset
exp_raw <- fread("koges_bmi.txt.gz")
Extracting instrumental variables
# select only significant variants
exp_raw <- subset(exp_raw,exp_raw$pval<5e-8)
exp_dat <- format_data( exp_raw,
type = "exposure",
snp_col = "rsids",
beta_col = "beta",
se_col = "sebeta",
effect_allele_col = "alt",
other_allele_col = "ref",
eaf_col = "af",
pval_col = "pval",
)
Clumping exposure variables
clumped_exp <- clump_data(exp_dat,clump_r2=0.01,pop="EAS")
outcome
out_raw <- fread("hum0197.v3.BBJ.T2D.v1/GWASsummary_T2D_Japanese_SakaueKanai2020.auto.txt.gz",
select=c("SNPID","Allele1","Allele2","BETA","SE","p.value"))
out_dat <- format_data( out_raw,
type = "outcome",
snp_col = "SNPID",
beta_col = "BETA",
se_col = "SE",
effect_allele_col = "Allele2",
other_allele_col = "Allele1",
pval_col = "p.value",
)
Harmonizing data
harmonized_data <- harmonise_data(clumped_exp,out_dat,action=1)
Perform MR analysis
res <- mr(harmonized_data)
id.exposure id.outcome outcome exposure method nsnp b se pval
<chr> <chr> <chr> <chr> <chr> <int> <dbl> <dbl> <dbl>
9J8pv4 IyUv6b outcome exposure MR Egger 28 1.3337580 0.69485260 6.596064e-02
9J8pv4 IyUv6b outcome exposure Weighted median 28 0.6298980 0.09401352 2.083081e-11
9J8pv4 IyUv6b outcome exposure Inverse variance weighted 28 0.5598956 0.23225806 1.592361e-02
9J8pv4 IyUv6b outcome exposure Simple mode 28 0.6097842 0.15180476 4.232158e-04
9J8pv4 IyUv6b outcome exposure Weighted mode 28 0.5946778 0.12820220 8.044488e-05
Sensitivity analysis
Heterogeneity
Test if there is heterogeneity among the causal effect of x on y estimated from each variants.
mr_heterogeneity(harmonized_data)
id.exposure id.outcome outcome exposure method Q Q_df Q_pval
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
9J8pv4 IyUv6b outcome exposure MR Egger 670.7022 26 1.000684e-124
9J8pv4 IyUv6b outcome exposure Inverse variance weighted 706.6579 27 1.534239e-131
Horizontal Pleiotropy
Intercept in MR-Egger
mr_pleiotropy_test(harmonized_data)
id.exposure id.outcome outcome exposure egger_intercept se pval
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
9J8pv4 IyUv6b outcome exposure -0.03603697 0.0305241 0.2484472
Single SNP MR and leave-one-out MR
Single SNP MR
res_single <- mr_singlesnp(harmonized_data)
res_single
exposure outcome id.exposure id.outcome samplesize SNP b se p
<chr> <chr> <chr> <chr> <lgl> <chr> <dbl> <dbl> <dbl>
1 exposure outcome 9J8pv4 IyUv6b NA rs10198356 0.6323140 0.2082837 2.398742e-03
2 exposure outcome 9J8pv4 IyUv6b NA rs10209994 0.9477808 0.3225814 3.302164e-03
3 exposure outcome 9J8pv4 IyUv6b NA rs10824329 0.6281765 0.3246214 5.297739e-02
4 exposure outcome 9J8pv4 IyUv6b NA rs10938397 1.2376316 0.2775854 8.251150e-06
5 exposure outcome 9J8pv4 IyUv6b NA rs11066132 0.6024303 0.2232401 6.963693e-03
6 exposure outcome 9J8pv4 IyUv6b NA rs12522139 0.2905201 0.2890240 3.148119e-01
7 exposure outcome 9J8pv4 IyUv6b NA rs12591730 0.8930490 0.3076687 3.700413e-03
8 exposure outcome 9J8pv4 IyUv6b NA rs13013021 1.4867889 0.2207777 1.646925e-11
9 exposure outcome 9J8pv4 IyUv6b NA rs1955337 0.5442640 0.2994146 6.910079e-02
10 exposure outcome 9J8pv4 IyUv6b NA rs2076308 1.1176226 0.2657969 2.613132e-05
11 exposure outcome 9J8pv4 IyUv6b NA rs2278557 0.6238587 0.2968184 3.556906e-02
12 exposure outcome 9J8pv4 IyUv6b NA rs2304608 1.5054682 0.2968905 3.961740e-07
13 exposure outcome 9J8pv4 IyUv6b NA rs2531995 1.3972908 0.3130157 8.045689e-06
14 exposure outcome 9J8pv4 IyUv6b NA rs261967 1.5303384 0.2921192 1.616714e-07
15 exposure outcome 9J8pv4 IyUv6b NA rs35332469 -0.2307314 0.3479219 5.072217e-01
16 exposure outcome 9J8pv4 IyUv6b NA rs35560038 -1.5730870 0.2018968 6.619637e-15
17 exposure outcome 9J8pv4 IyUv6b NA rs3755804 0.5314915 0.2325073 2.225933e-02
18 exposure outcome 9J8pv4 IyUv6b NA rs4470425 0.6948046 0.3079944 2.407689e-02
19 exposure outcome 9J8pv4 IyUv6b NA rs476828 1.1739083 0.1568550 7.207355e-14
20 exposure outcome 9J8pv4 IyUv6b NA rs4883723 0.5479721 0.2855004 5.494141e-02
21 exposure outcome 9J8pv4 IyUv6b NA rs509325 0.5491040 0.1598196 5.908641e-04
22 exposure outcome 9J8pv4 IyUv6b NA rs55872725 1.3501891 0.1259791 8.419325e-27
23 exposure outcome 9J8pv4 IyUv6b NA rs6089309 0.5657525 0.3347009 9.096620e-02
24 exposure outcome 9J8pv4 IyUv6b NA rs6265 0.6457693 0.1901871 6.851804e-04
25 exposure outcome 9J8pv4 IyUv6b NA rs6736712 0.5606962 0.3448784 1.039966e-01
26 exposure outcome 9J8pv4 IyUv6b NA rs7560832 0.6032080 0.2904972 3.785077e-02
27 exposure outcome 9J8pv4 IyUv6b NA rs825486 -0.6152759 0.3500334 7.878772e-02
28 exposure outcome 9J8pv4 IyUv6b NA rs9348441 -4.9786332 0.2572782 1.992909e-83
29 exposure outcome 9J8pv4 IyUv6b NA All - Inverse variance weighted 0.5598956 0.2322581 1.592361e-02
30 exposure outcome 9J8pv4 IyUv6b NA All - MR Egger 1.3337580 0.6948526 6.596064e-02
leave-one-out MR
res_loo <- mr_leaveoneout(harmonized_data)
res_loo
exposure outcome id.exposure id.outcome samplesize SNP b se p
<chr> <chr> <chr> <chr> <lgl> <chr> <dbl> <dbl> <dbl>
1 exposure outcome 9J8pv4 IyUv6b NA rs10198356 0.5562834 0.2424917 2.178871e-02
2 exposure outcome 9J8pv4 IyUv6b NA rs10209994 0.5520576 0.2388122 2.079526e-02
3 exposure outcome 9J8pv4 IyUv6b NA rs10824329 0.5585335 0.2390239 1.945341e-02
4 exposure outcome 9J8pv4 IyUv6b NA rs10938397 0.5412688 0.2388709 2.345460e-02
5 exposure outcome 9J8pv4 IyUv6b NA rs11066132 0.5580606 0.2417275 2.096381e-02
6 exposure outcome 9J8pv4 IyUv6b NA rs12522139 0.5667102 0.2395064 1.797373e-02
7 exposure outcome 9J8pv4 IyUv6b NA rs12591730 0.5524802 0.2390990 2.085075e-02
8 exposure outcome 9J8pv4 IyUv6b NA rs13013021 0.5189715 0.2386808 2.968017e-02
9 exposure outcome 9J8pv4 IyUv6b NA rs1955337 0.5602635 0.2394505 1.929468e-02
10 exposure outcome 9J8pv4 IyUv6b NA rs2076308 0.5431355 0.2394403 2.330758e-02
11 exposure outcome 9J8pv4 IyUv6b NA rs2278557 0.5583634 0.2394924 1.972992e-02
12 exposure outcome 9J8pv4 IyUv6b NA rs2304608 0.5372557 0.2377325 2.382639e-02
13 exposure outcome 9J8pv4 IyUv6b NA rs2531995 0.5419016 0.2379712 2.277590e-02
14 exposure outcome 9J8pv4 IyUv6b NA rs261967 0.5358761 0.2376686 2.415093e-02
15 exposure outcome 9J8pv4 IyUv6b NA rs35332469 0.5735907 0.2378345 1.587739e-02
16 exposure outcome 9J8pv4 IyUv6b NA rs35560038 0.6734906 0.2217804 2.391474e-03
17 exposure outcome 9J8pv4 IyUv6b NA rs3755804 0.5610215 0.2413249 2.008503e-02
18 exposure outcome 9J8pv4 IyUv6b NA rs4470425 0.5568993 0.2392632 1.993549e-02
19 exposure outcome 9J8pv4 IyUv6b NA rs476828 0.5037555 0.2443224 3.922224e-02
20 exposure outcome 9J8pv4 IyUv6b NA rs4883723 0.5602050 0.2397325 1.945000e-02
21 exposure outcome 9J8pv4 IyUv6b NA rs509325 0.5608429 0.2468506 2.308693e-02
22 exposure outcome 9J8pv4 IyUv6b NA rs55872725 0.4419446 0.2454771 7.180543e-02
23 exposure outcome 9J8pv4 IyUv6b NA rs6089309 0.5597859 0.2388902 1.911519e-02
24 exposure outcome 9J8pv4 IyUv6b NA rs6265 0.5547068 0.2436910 2.282978e-02
25 exposure outcome 9J8pv4 IyUv6b NA rs6736712 0.5598815 0.2387602 1.902944e-02
26 exposure outcome 9J8pv4 IyUv6b NA rs7560832 0.5588113 0.2396229 1.969836e-02
27 exposure outcome 9J8pv4 IyUv6b NA rs825486 0.5800026 0.2367545 1.429330e-02
28 exposure outcome 9J8pv4 IyUv6b NA rs9348441 0.7378967 0.1366838 6.717515e-08
29 exposure outcome 9J8pv4 IyUv6b NA All 0.5598956 0.2322581 1.592361e-02
Visualization
Scatter plot
res <- mr(harmonized_data)
p1 <- mr_scatter_plot(res, harmonized_data)
p1[[1]]
Single SNP
res_single <- mr_singlesnp(harmonized_data)
p2 <- mr_forest_plot(res_single)
p2[[1]]
Leave-one-out
res_loo <- mr_leaveoneout(harmonized_data)
p3 <- mr_leaveoneout_plot(res_loo)
p3[[1]]
Funnel plot
res_single <- mr_singlesnp(harmonized_data)
p4 <- mr_funnel_plot(res_single)
p4[[1]]
MR Steiger directionality test
MR Steiger directionality test is a method to test the causal direction.
Steiger test: test whether the SNP-outcome correlation is greater than the SNP-exposure correlation.
harmonized_data$"r.outcome" <- get_r_from_lor(
harmonized_data$"beta.outcome",
harmonized_data$"eaf.outcome",
45383,
132032,
0.26,
model = "logit",
correction = FALSE
)
out <- directionality_test(harmonized_data)
out
id.exposure id.outcome exposure outcome snp_r2.exposure snp_r2.outcome correct_causal_direction steiger_pval
<chr> <chr> <chr> <chr> <dbl> <dbl> <lgl> <dbl>
rvi6Om ETcv15 BMI T2D 0.02125453 0.005496427 TRUE NA
Reference: Hemani, G., Tilling, K., & Davey Smith, G. (2017). Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLoS genetics, 13(11), e1007081.
MR-Base (web app)
STROBE-MR
Before reporting any MR results, please check the STROBE-MR Checklist first, which consists of 20 things that should be addressed when reporting a mendelian randomization study.
- Skrivankova, V. W., Richmond, R. C., Woolf, B. A., Yarmolinsky, J., Davies, N. M., Swanson, S. A., ... & Richards, J. B. (2021). Strengthening the reporting of observational studies in epidemiology using Mendelian randomization: the STROBE-MR statement. Jama, 326(16), 1614-1621.
References
- Sanderson, E., Glymour, M. M., Holmes, M. V., Kang, H., Morrison, J., Munafò, M. R., ... & Davey Smith, G. (2022). Mendelian randomization. Nature Reviews Methods Primers, 2(1), 1-21.
- Hemani, G., Zheng, J., Elsworth, B., Wade, K. H., Haberland, V., Baird, D., ... & Haycock, P. C. (2018). The MR-Base platform supports systematic causal inference across the human phenome. elife, 7, e34408.
- Zuckerkandl, E., & Villet, R. (1988). Concentration-affinity equivalence in gene regulation: convergence of genetic and environmental effects. Proceedings of the National Academy of Sciences, 85(13), 4784-4788.
- Skrivankova, V. W., Richmond, R. C., Woolf, B. A., Yarmolinsky, J., Davies, N. M., Swanson, S. A., ... & Richards, J. B. (2021). Strengthening the reporting of observational studies in epidemiology using Mendelian randomization: the STROBE-MR statement. Jama, 326(16), 1614-1621.