Brisbane plot¶
In [1]:
Copied!
import gwaslab as gl
import pandas as pd
import gwaslab as gl
import pandas as pd
Download data¶
Yengo, L., Vedantam, S., Marouli, E., Sidorenko, J., Bartell, E., Sakaue, S., ... & Lee, J. Y. (2022). A saturated map of common genetic variants associated with human height. Nature, 1-16. Fig. 2: Brisbane plot
In [2]:
Copied!
!wget https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-05275-y/MediaObjects/41586_2022_5275_MOESM3_ESM.xlsx
!wget https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-05275-y/MediaObjects/41586_2022_5275_MOESM3_ESM.xlsx
--2024-02-03 19:40:17-- https://static-content.springer.com/esm/art%3A10.1038%2Fs41586-022-05275-y/MediaObjects/41586_2022_5275_MOESM3_ESM.xlsx Resolving static-content.springer.com (static-content.springer.com)... 146.75.112.95, 23.235.32.32, 104.156.80.32, ... Connecting to static-content.springer.com (static-content.springer.com)|146.75.112.95|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 9368771 (8.9M) [application/octet-stream] Saving to: ‘41586_2022_5275_MOESM3_ESM.xlsx.2’ 41586_2022_5275_MOE 100%[===================>] 8.93M 45.4MB/s in 0.2s 2024-02-03 19:40:18 (45.4 MB/s) - ‘41586_2022_5275_MOESM3_ESM.xlsx.2’ saved [9368771/9368771]
Read into pandas dataframe¶
In [3]:
Copied!
data = pd.read_excel("41586_2022_5275_MOESM3_ESM.xlsx",sheet_name=10,skiprows=1)
data
data = pd.read_excel("41586_2022_5275_MOESM3_ESM.xlsx",sheet_name=10,skiprows=1)
data
/home/yunye/anaconda3/lib/python3.9/site-packages/openpyxl/worksheet/_reader.py:329: UserWarning: Unknown extension is not supported and will be removed warn(msg)
Out[3]:
Locus | Chr | SNP | BP (HG19) | BP (HG38) | Effect Allele (A1) | A1 Frequency | Marginal effect size (b) | b SE | b P-value | ... | LD with the next SNP | Distance to the closest gene (kb) | Closest gene | Position relative to the closest gene | Replication (EBB) effect size | Replication (EBB) SE | Replication (EBB) P-value | Replication (EBB) COJO effect size | Replication (EBB) COJO SE | Replication (EBB) COJO P-value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1:METAFE | 1 | rs2710888 | 959842 | 1024462 | T | 0.340777 | 0.011478 | 0.000719 | 2.190000e-57 | ... | 0.383297 | 4.340 | AGRN | Gene-Body | 0.016117 | 0.006618 | 0.014882 | 0.011628 | 0.007169 | 0.104822 |
1 | 1:METAFE | 1 | rs3934834 | 1005806 | 1070426 | T | 0.155679 | 0.009750 | 0.000867 | 2.440000e-29 | ... | 0.032079 | 1.319 | RNF223 | Up-stream | 0.020319 | 0.008240 | 0.013661 | 0.014769 | 0.008928 | 0.098090 |
2 | 2:METAFE | 1 | rs182532 | 1287040 | 1351660 | T | 0.032072 | -0.016253 | 0.001845 | 1.250000e-18 | ... | -0.060743 | 1.028 | MXRA8 | Up-stream | 0.019320 | 0.017633 | 0.273230 | 0.017967 | 0.017684 | 0.309621 |
3 | 2:METAFE | 1 | rs17160669 | 1305561 | 1370181 | T | 0.187613 | 0.010000 | 0.000902 | 1.480000e-28 | ... | 0.017875 | 3.548 | AURKAIP1 | Up-stream | 0.009437 | 0.009642 | 0.327702 | 0.011038 | 0.009670 | 0.253674 |
4 | 3:METAFE | 1 | rs9660106 | 1797947 | 1866508 | G | 0.484076 | 0.004450 | 0.000632 | 1.860000e-12 | ... | 0.027731 | 24.609 | GNB1 | Gene-Body | 0.014751 | 0.006449 | 0.022182 | 0.013992 | 0.006462 | 0.030383 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12106 | 7206:METAFE | 22 | rs9628283 | 50540766 | 50102337 | C | 0.028417 | 0.014488 | 0.001852 | 5.130000e-15 | ... | -0.065326 | 12.332 | MOV10L1 | Gene-Body | 0.014345 | 0.027401 | 0.600616 | 0.008326 | 0.027836 | 0.764848 |
12107 | 7207:METAFE | 22 | rs28642259 | 50785718 | 50347289 | T | 0.323849 | -0.005336 | 0.000719 | 1.140000e-13 | ... | -0.101697 | 3.973 | PPP6R2 | Gene-Body | -0.003071 | 0.007242 | 0.671512 | -0.001554 | 0.007312 | 0.831684 |
12108 | 7208:METAFE | 22 | rs11555194 | 50876662 | 50438233 | G | 0.028294 | 0.015256 | 0.001921 | 2.000000e-15 | ... | 0.106224 | 6.768 | SBF1 | Up-stream | 0.018349 | 0.019176 | 0.338620 | 0.017029 | 0.019396 | 0.379943 |
12109 | 7208:METAFE | 22 | rs762669 | 50943423 | 50504994 | A | 0.175297 | -0.009376 | 0.000820 | 3.000000e-30 | ... | 0.006762 | 2.048 | LMF2 | Gene-Body | -0.001830 | 0.009092 | 0.840476 | -0.002904 | 0.009160 | 0.751260 |
12110 | 7209:METAFE | 22 | rs9628185 | 51109992 | 50671564 | C | 0.457101 | 0.004333 | 0.000628 | 5.430000e-12 | ... | 0.000000 | 3.077 | SHANK3 | Up-stream | 0.000249 | 0.006356 | 0.968802 | 0.000514 | 0.006368 | 0.935627 |
12111 rows × 26 columns
Load into gwaslab Sumstats¶
In [4]:
Copied!
sumstats = gl.Sumstats(data,snpid="SNP",chrom="Chr",pos="BP (HG19)",p="b P-value")
sumstats = gl.Sumstats(data,snpid="SNP",chrom="Chr",pos="BP (HG19)",p="b P-value")
Sat Feb 3 19:40:22 2024 GWASLab v3.4.38 https://cloufield.github.io/gwaslab/ Sat Feb 3 19:40:22 2024 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com Sat Feb 3 19:40:22 2024 Start to initialize gl.Sumstats from pandas DataFrame ... Sat Feb 3 19:40:22 2024 -Reading columns : BP (HG19),b P-value,SNP,Chr Sat Feb 3 19:40:22 2024 -Renaming columns to : POS,P,SNPID,CHR Sat Feb 3 19:40:22 2024 -Current Dataframe shape : 12111 x 4 Sat Feb 3 19:40:22 2024 -Initiating a status column: STATUS ... Sat Feb 3 19:40:22 2024 #WARNING! Version of genomic coordinates is unknown... Sat Feb 3 19:40:22 2024 Start to reorder the columns...v3.4.38 Sat Feb 3 19:40:22 2024 -Current Dataframe shape : 12111 x 5 ; Memory usage: 20.36 MB Sat Feb 3 19:40:22 2024 -Reordering columns to : SNPID,CHR,POS,P,STATUS Sat Feb 3 19:40:22 2024 Finished reordering the columns. Sat Feb 3 19:40:22 2024 -Column : SNPID CHR POS P STATUS Sat Feb 3 19:40:22 2024 -DType : object string int64 float64 category Sat Feb 3 19:40:22 2024 -Verified: T F T T T Sat Feb 3 19:40:22 2024 #WARNING! Columns with possibly incompatable dtypes: CHR Sat Feb 3 19:40:22 2024 -Current Dataframe memory usage: 20.36 MB Sat Feb 3 19:40:22 2024 Finished loading data successfully!
Brisbane plot¶
mode=b
: Brisbane plot
In [5]:
Copied!
sumstats.plot_mqq(mode="b")
sumstats.plot_mqq(mode="b")
Sat Feb 3 19:40:22 2024 Start to create MQQ plot...v3.4.38: Sat Feb 3 19:40:22 2024 -Genomic coordinates version: 99... Sat Feb 3 19:40:22 2024 #WARNING! Genomic coordinates version is unknown. Sat Feb 3 19:40:22 2024 -Genome-wide significance level to plot is set to 5e-08 ... Sat Feb 3 19:40:22 2024 -Raw input contains 12111 variants... Sat Feb 3 19:40:22 2024 -MQQ plot layout mode is : b Sat Feb 3 19:40:22 2024 Finished loading specified columns from the sumstats. Sat Feb 3 19:40:22 2024 Start data conversion and sanity check: Sat Feb 3 19:40:22 2024 -Removed 0 variants with nan in CHR or POS column ... Sat Feb 3 19:40:22 2024 -Removed 0 variants with CHR <=0... Sat Feb 3 19:40:22 2024 -Calculating DENSITY with windowsize of 100 kb Sat Feb 3 19:40:23 2024 -Converting data above cut line... Sat Feb 3 19:40:23 2024 -Maximum DENSITY value is 24 . Sat Feb 3 19:40:23 2024 Finished data conversion and sanity check. Sat Feb 3 19:40:23 2024 Start to create MQQ plot with 12111 variants... Sat Feb 3 19:40:23 2024 -Creating background plot... Sat Feb 3 19:40:23 2024 Finished creating MQQ plot successfully! Sat Feb 3 19:40:23 2024 Start to extract variants for annotation... Sat Feb 3 19:40:23 2024 Finished extracting variants for annotation... Sat Feb 3 19:40:23 2024 Start to process figure arts. Sat Feb 3 19:40:23 2024 -Processing X ticks... Sat Feb 3 19:40:23 2024 -Processing X labels... Sat Feb 3 19:40:23 2024 -Processing Y labels... Sat Feb 3 19:40:23 2024 -Processing Y tick lables... Sat Feb 3 19:40:23 2024 -Processing Y labels... Sat Feb 3 19:40:23 2024 -Processing lines... Sat Feb 3 19:40:23 2024 Finished processing figure arts. Sat Feb 3 19:40:23 2024 Start to annotate variants... Sat Feb 3 19:40:23 2024 -Skip annotating Sat Feb 3 19:40:23 2024 Finished annotating variants. Sat Feb 3 19:40:23 2024 Start to save figure... Sat Feb 3 19:40:23 2024 -Skip saving figure! Sat Feb 3 19:40:23 2024 Finished saving figure... Sat Feb 3 19:40:23 2024 Finished creating plot successfully!
Out[5]:
(<Figure size 3000x1000 with 1 Axes>, <gwaslab.g_Log.Log at 0x7fe18a6e6df0>)
Some customizations¶
In [6]:
Copied!
sumstats.plot_mqq(mode="b",anno=True,sig_line_color="red", verbose=False)
sumstats.plot_mqq(mode="b",anno=True,sig_line_color="red", verbose=False)
Sat Feb 3 19:40:23 2024 #WARNING! Genomic coordinates version is unknown.
Out[6]:
(<Figure size 3000x1000 with 1 Axes>, <gwaslab.g_Log.Log at 0x7fe18a6e6df0>)
Annotate with gene names¶
In [7]:
Copied!
sumstats.plot_mqq(mode="b",anno="GENENAME",sig_line_color="red",build="19", verbose=False)
sumstats.plot_mqq(mode="b",anno="GENENAME",sig_line_color="red",build="19", verbose=False)
Out[7]:
(<Figure size 3000x1000 with 1 Axes>, <gwaslab.g_Log.Log at 0x7fe18a6e6df0>)