Regional plot¶
load gwaslab¶
In [1]:
Copied!
import sys
sys.path.insert(0,"/home/yunye/work/gwaslab/src")
import gwaslab as gl
import sys
sys.path.insert(0,"/home/yunye/work/gwaslab/src")
import gwaslab as gl
download sample data¶
In [2]:
Copied!
#!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
#!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
load sumstats into gwaslab.Sumstats¶
In [3]:
Copied!
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
build="19",
n="N")
mysumstats.filter_value('CHR =="7"',inplace=True)
mysumstats.basic_check(verbose = False)
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
build="19",
n="N")
mysumstats.filter_value('CHR =="7"',inplace=True)
mysumstats.basic_check(verbose = False)
2024/02/15 18:52:34 GWASLab v3.4.40 https://cloufield.github.io/gwaslab/ 2024/02/15 18:52:34 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com 2024/02/15 18:52:34 Start to initialize gl.Sumstats from file :t2d_bbj.txt.gz 2024/02/15 18:52:50 -Reading columns : P,BETA,Frq,SE,N,POS,REF,Dir,SNP,ALT,CHR 2024/02/15 18:52:50 -Renaming columns to : P,BETA,EAF,SE,N,POS,NEA,DIRECTION,SNPID,EA,CHR 2024/02/15 18:52:50 -Current Dataframe shape : 12557761 x 11 2024/02/15 18:52:50 -Initiating a status column: STATUS ... 2024/02/15 18:52:50 -Genomic coordinates are based on GRCh37/hg19... 2024/02/15 18:52:52 -NEAF is specified... 2024/02/15 18:52:52 -Checking if 0<= NEAF <=1 ... 2024/02/15 18:52:53 -Converted NEAF to EAF. 2024/02/15 18:52:53 -Removed 0 variants with bad NEAF. 2024/02/15 18:52:53 Start to reorder the columns...v3.4.40 2024/02/15 18:52:53 -Current Dataframe shape : 12557761 x 12 ; Memory usage: 1100.88 MB 2024/02/15 18:52:53 -Reordering columns to : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS 2024/02/15 18:52:53 Finished reordering the columns. 2024/02/15 18:52:53 -Column : SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS 2024/02/15 18:52:53 -DType : object string int64 category category float64 float64 float64 float64 int64 object category 2024/02/15 18:52:53 -Verified: T F T T T T T T T T T T 2024/02/15 18:52:53 #WARNING! Columns with possibly incompatible dtypes: CHR 2024/02/15 18:52:54 -Current Dataframe memory usage: 1100.88 MB 2024/02/15 18:52:54 Finished loading data successfully! 2024/02/15 18:52:54 Start filtering values by condition: CHR =="7" 2024/02/15 18:52:54 -Removing 11849981 variants not meeting the conditions: CHR =="7" 2024/02/15 18:52:54 Finished filtering values.
Create Manhattan plot with sumstats on a single chromosome¶
In [4]:
Copied!
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl")
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl")
2024/02/15 18:53:01 Start to create MQQ plot...v3.4.40: 2024/02/15 18:53:01 -Genomic coordinates version: 19... 2024/02/15 18:53:01 -Genome-wide significance level to plot is set to 5e-08 ... 2024/02/15 18:53:01 -Raw input contains 707780 variants... 2024/02/15 18:53:01 -MQQ plot layout mode is : m 2024/02/15 18:53:01 Finished loading specified columns from the sumstats. 2024/02/15 18:53:01 Start data conversion and sanity check: 2024/02/15 18:53:01 -Removed 0 variants with nan in CHR or POS column ... 2024/02/15 18:53:01 -Removed 0 variants with CHR <=0... 2024/02/15 18:53:01 -Removed 0 variants with nan in P column ... 2024/02/15 18:53:01 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... 2024/02/15 18:53:02 -Sumstats P values are being converted to -log10(P)... 2024/02/15 18:53:02 -Sanity check: 0 na/inf/-inf variants will be removed... 2024/02/15 18:53:02 -Converting data above cut line... 2024/02/15 18:53:02 -Maximum -log10(P) value is 73.38711023071251 . 2024/02/15 18:53:02 Finished data conversion and sanity check. 2024/02/15 18:53:02 Start to create MQQ plot with 707780 variants... 2024/02/15 18:53:02 -Creating background plot... 2024/02/15 18:53:03 Finished creating MQQ plot successfully! 2024/02/15 18:53:03 Start to extract variants for annotation... 2024/02/15 18:53:03 -Found 7 significant variants with a sliding window size of 500 kb... 2024/02/15 18:53:03 Start to annotate variants with nearest gene name(s)... 2024/02/15 18:53:03 -Assigning Gene name using ensembl_hg19_gtf for protein coding genes 2024/02/15 18:53:03 Finished annotating variants with nearest gene name(s) successfully! 2024/02/15 18:53:03 Finished extracting variants for annotation... 2024/02/15 18:53:03 Start to process figure arts. 2024/02/15 18:53:03 -Processing X ticks... 2024/02/15 18:53:03 -Processing X labels... 2024/02/15 18:53:03 -Processing Y labels... 2024/02/15 18:53:03 -Processing Y tick lables... 2024/02/15 18:53:03 -Processing Y labels... 2024/02/15 18:53:03 -Processing lines... 2024/02/15 18:53:03 Finished processing figure arts. 2024/02/15 18:53:03 Start to annotate variants... 2024/02/15 18:53:03 -Annotating using column GENENAME... 2024/02/15 18:53:03 -Adjusting text positions with repel_force=0.03... 2024/02/15 18:53:03 Finished annotating variants. 2024/02/15 18:53:03 Start to save figure... 2024/02/15 18:53:03 -Skip saving figure! 2024/02/15 18:53:03 Finished saving figure... 2024/02/15 18:53:03 Finished creating plot successfully!
Out[4]:
(<Figure size 3000x1000 with 1 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)
Check lead variants¶
In [5]:
Copied!
mysumstats.get_lead()
mysumstats.get_lead()
2024/02/15 18:53:07 Start to extract lead variants...v3.4.40 2024/02/15 18:53:07 -Current Dataframe shape : 707780 x 12 ; Memory usage: 73.87 MB 2024/02/15 18:53:07 -Processing 707780 variants... 2024/02/15 18:53:07 -Significance threshold : 5e-08 2024/02/15 18:53:07 -Sliding window size: 500 kb 2024/02/15 18:53:07 -Using P for extracting lead variants... 2024/02/15 18:53:07 -Found 1077 significant variants in total... 2024/02/15 18:53:08 -Identified 7 lead variants! 2024/02/15 18:53:08 Finished extracting lead variants.
Out[5]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | N | DIRECTION | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
77576 | 7:13888699_G_C | 7 | 13888699 | G | C | 0.5680 | 0.0562 | 0.0094 | 2.507000e-09 | 191764 | ++++ | 1960099 |
83154 | 7:14898282_C_T | 7 | 14898282 | C | T | 0.6012 | 0.0617 | 0.0088 | 2.336000e-12 | 191764 | ++++ | 1960099 |
229433 | 7:44174857_T_G | 7 | 44174857 | G | T | 0.5985 | -0.0640 | 0.0093 | 5.325000e-12 | 191764 | ---- | 1960099 |
335366 | 7:69406661_A_T | 7 | 69406661 | T | A | 0.1981 | -0.0900 | 0.0111 | 4.871000e-16 | 191764 | ---- | 1960099 |
568451 | 7:127253550_C_T | 7 | 127253550 | C | T | 0.9081 | 0.2761 | 0.0152 | 4.101000e-74 | 191764 | ++++ | 1960099 |
579917 | 7:130025713_G_A | 7 | 130025713 | G | A | 0.9530 | -0.1365 | 0.0230 | 3.068000e-09 | 191764 | ---- | 1960099 |
695434 | 7:157038803_A_G | 7 | 157038803 | G | A | 0.4626 | -0.0502 | 0.0088 | 1.127000e-08 | 191764 | ---- | 1960099 |
Create a regional plot with no additional information¶
In [6]:
Copied!
mysumstats.plot_mqq(region=(7,156538803,157538803))
mysumstats.plot_mqq(region=(7,156538803,157538803))
2024/02/15 18:53:08 Start to create MQQ plot...v3.4.40: 2024/02/15 18:53:08 -Genomic coordinates version: 19... 2024/02/15 18:53:08 -Genome-wide significance level to plot is set to 5e-08 ... 2024/02/15 18:53:08 -Raw input contains 707780 variants... 2024/02/15 18:53:08 -MQQ plot layout mode is : mqq 2024/02/15 18:53:08 -Region to plot : chr7:156538803-157538803. 2024/02/15 18:53:08 -Extract SNPs in region : chr7:156538803-157538803... 2024/02/15 18:53:08 -Extract SNPs in specified regions: 5831 2024/02/15 18:53:08 Finished loading specified columns from the sumstats. 2024/02/15 18:53:08 Start data conversion and sanity check: 2024/02/15 18:53:08 -Removed 0 variants with nan in CHR or POS column ... 2024/02/15 18:53:08 -Removed 0 variants with CHR <=0... 2024/02/15 18:53:08 -Removed 0 variants with nan in P column ... 2024/02/15 18:53:08 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... 2024/02/15 18:53:08 -Sumstats P values are being converted to -log10(P)... 2024/02/15 18:53:08 -Sanity check: 0 na/inf/-inf variants will be removed... 2024/02/15 18:53:08 -Converting data above cut line... 2024/02/15 18:53:08 -Maximum -log10(P) value is 7.948076083953893 . 2024/02/15 18:53:08 Finished data conversion and sanity check. 2024/02/15 18:53:08 Start to create MQQ plot with 5831 variants... 2024/02/15 18:53:08 -Creating background plot... 2024/02/15 18:53:08 Finished creating MQQ plot successfully! 2024/02/15 18:53:08 Start to extract variants for annotation... 2024/02/15 18:53:08 -Found 1 significant variants with a sliding window size of 500 kb... 2024/02/15 18:53:08 Finished extracting variants for annotation... 2024/02/15 18:53:08 Start to process figure arts. 2024/02/15 18:53:08 -Processing X labels... 2024/02/15 18:53:08 -Processing Y labels... 2024/02/15 18:53:08 -Processing Y tick lables... 2024/02/15 18:53:08 -Processing Y labels... 2024/02/15 18:53:08 -Processing lines... 2024/02/15 18:53:08 Finished processing figure arts. 2024/02/15 18:53:08 Start to annotate variants... 2024/02/15 18:53:08 -Skip annotating 2024/02/15 18:53:08 Finished annotating variants. 2024/02/15 18:53:08 Start to create QQ plot with 5831 variants: 2024/02/15 18:53:08 -Plotting all variants... 2024/02/15 18:53:08 -Expected range of P: (0,1.0) 2024/02/15 18:53:08 -Lambda GC (MLOG10P mode) at 0.5 is 1.71914 2024/02/15 18:53:08 -Processing Y tick lables... 2024/02/15 18:53:08 Finished creating QQ plot successfully! 2024/02/15 18:53:08 Start to save figure... 2024/02/15 18:53:08 -Skip saving figure! 2024/02/15 18:53:08 Finished saving figure... 2024/02/15 18:53:08 Finished creating plot successfully!
Out[6]:
(<Figure size 3000x1000 with 2 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)
Create a regional plot with gene track¶
In [7]:
Copied!
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803))
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803))
2024/02/15 18:53:08 Start to create MQQ plot...v3.4.40: 2024/02/15 18:53:08 -Genomic coordinates version: 19... 2024/02/15 18:53:08 -Genome-wide significance level to plot is set to 5e-08 ... 2024/02/15 18:53:08 -Raw input contains 707780 variants... 2024/02/15 18:53:08 -MQQ plot layout mode is : r 2024/02/15 18:53:08 -Region to plot : chr7:156538803-157538803. 2024/02/15 18:53:08 -Extract SNPs in region : chr7:156538803-157538803... 2024/02/15 18:53:08 -Extract SNPs in specified regions: 5831 2024/02/15 18:53:08 Finished loading specified columns from the sumstats. 2024/02/15 18:53:08 Start data conversion and sanity check: 2024/02/15 18:53:08 -Removed 0 variants with nan in CHR or POS column ... 2024/02/15 18:53:08 -Removed 0 variants with CHR <=0... 2024/02/15 18:53:08 -Removed 0 variants with nan in P column ... 2024/02/15 18:53:08 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... 2024/02/15 18:53:08 -Sumstats P values are being converted to -log10(P)... 2024/02/15 18:53:08 -Sanity check: 0 na/inf/-inf variants will be removed... 2024/02/15 18:53:08 -Converting data above cut line... 2024/02/15 18:53:08 -Maximum -log10(P) value is 7.948076083953893 . 2024/02/15 18:53:08 Finished data conversion and sanity check. 2024/02/15 18:53:08 Start to create MQQ plot with 5831 variants... 2024/02/15 18:53:08 -Creating background plot... 2024/02/15 18:53:08 -Extracting lead variant... 2024/02/15 18:53:09 -Loading gtf files from:default
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']
2024/02/15 18:53:37 -plotting gene track.. 2024/02/15 18:53:37 -Finished plotting gene track.. 2024/02/15 18:53:37 Finished creating MQQ plot successfully! 2024/02/15 18:53:37 Start to extract variants for annotation... 2024/02/15 18:53:37 -Found 1 significant variants with a sliding window size of 500 kb... 2024/02/15 18:53:37 Finished extracting variants for annotation... 2024/02/15 18:53:37 Start to process figure arts. 2024/02/15 18:53:37 -Processing X labels... 2024/02/15 18:53:37 -Processing Y labels... 2024/02/15 18:53:37 -Processing Y tick lables... 2024/02/15 18:53:37 -Processing Y labels... 2024/02/15 18:53:37 -Processing lines... 2024/02/15 18:53:37 Finished processing figure arts. 2024/02/15 18:53:37 Start to annotate variants... 2024/02/15 18:53:37 -Skip annotating 2024/02/15 18:53:37 Finished annotating variants. 2024/02/15 18:53:37 Start to save figure... 2024/02/15 18:53:37 -Skip saving figure! 2024/02/15 18:53:37 Finished saving figure... 2024/02/15 18:53:37 Finished creating plot successfully!
Out[7]:
(<Figure size 3000x2000 with 3 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)
Create regional plot with gene track and LD information¶
In [8]:
Copied!
mysumstats.plot_mqq(mode="r",
region=(7,156538803,157538803),
vcf_path=gl.get_path("1kg_eas_hg19")
)
mysumstats.plot_mqq(mode="r",
region=(7,156538803,157538803),
vcf_path=gl.get_path("1kg_eas_hg19")
)
2024/02/15 18:53:37 Start to create MQQ plot...v3.4.40: 2024/02/15 18:53:37 -Genomic coordinates version: 19... 2024/02/15 18:53:37 -Genome-wide significance level to plot is set to 5e-08 ... 2024/02/15 18:53:37 -Raw input contains 707780 variants... 2024/02/15 18:53:37 -MQQ plot layout mode is : r 2024/02/15 18:53:37 -Region to plot : chr7:156538803-157538803. 2024/02/15 18:53:37 -Checking prefix for chromosomes in vcf files... 2024/02/15 18:53:38 -No prefix for chromosomes in the VCF files. 2024/02/15 18:53:38 -Extract SNPs in region : chr7:156538803-157538803... 2024/02/15 18:53:38 -Extract SNPs in specified regions: 5831 2024/02/15 18:53:38 Finished loading specified columns from the sumstats. 2024/02/15 18:53:38 Start data conversion and sanity check: 2024/02/15 18:53:38 -Removed 0 variants with nan in CHR or POS column ... 2024/02/15 18:53:38 -Removed 0 variants with CHR <=0... 2024/02/15 18:53:38 -Removed 0 variants with nan in P column ... 2024/02/15 18:53:38 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... 2024/02/15 18:53:38 -Sumstats P values are being converted to -log10(P)... 2024/02/15 18:53:38 -Sanity check: 0 na/inf/-inf variants will be removed... 2024/02/15 18:53:38 -Converting data above cut line... 2024/02/15 18:53:38 -Maximum -log10(P) value is 7.948076083953893 . 2024/02/15 18:53:38 Finished data conversion and sanity check. 2024/02/15 18:53:38 Start to create MQQ plot with 5831 variants... 2024/02/15 18:53:38 -tabix will be used: /home/yunye/tools/bin/tabix 2024/02/15 18:53:38 Start to load reference genotype... 2024/02/15 18:53:38 -reference vcf path : /home/yunye/.gwaslab/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz 2024/02/15 18:53:40 -Retrieving index... 2024/02/15 18:53:40 -Ref variants in the region: 35278 2024/02/15 18:53:40 -Matching variants using POS, NEA, EA ... 2024/02/15 18:53:40 -Calculating Rsq... 2024/02/15 18:53:40 Finished loading reference genotype successfully! 2024/02/15 18:53:40 -Creating background plot... 2024/02/15 18:53:40 -Extracting lead variant... 2024/02/15 18:53:40 -Loading gtf files from:default
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']
2024/02/15 18:54:09 -plotting gene track.. 2024/02/15 18:54:09 -Finished plotting gene track.. 2024/02/15 18:54:09 Finished creating MQQ plot successfully! 2024/02/15 18:54:09 Start to extract variants for annotation... 2024/02/15 18:54:09 -Found 1 significant variants with a sliding window size of 500 kb... 2024/02/15 18:54:09 Finished extracting variants for annotation... 2024/02/15 18:54:09 Start to process figure arts. 2024/02/15 18:54:09 -Processing X labels... 2024/02/15 18:54:09 -Processing Y labels... 2024/02/15 18:54:09 -Processing Y tick lables... 2024/02/15 18:54:09 -Processing Y labels... 2024/02/15 18:54:09 -Processing color bar... 2024/02/15 18:54:09 -Processing lines... 2024/02/15 18:54:09 Finished processing figure arts. 2024/02/15 18:54:09 Start to annotate variants... 2024/02/15 18:54:09 -Skip annotating 2024/02/15 18:54:09 Finished annotating variants. 2024/02/15 18:54:09 Start to save figure... 2024/02/15 18:54:09 -Skip saving figure! 2024/02/15 18:54:09 Finished saving figure... 2024/02/15 18:54:09 Finished creating plot successfully!
Out[8]:
(<Figure size 3000x2000 with 4 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)
Create regional plot with two reference variants¶
In [9]:
Copied!
mysumstats.filter_flanking_by_chrpos([(7,156738803)],windowsizekb=100).get_lead(sig_level=1e-5)
mysumstats.filter_flanking_by_chrpos([(7,156738803)],windowsizekb=100).get_lead(sig_level=1e-5)
2024/02/15 18:54:10 Start to extract variants in the flanking regions using CHR and POS... 2024/02/15 18:54:10 - Central positions: [(7, 156738803)] 2024/02/15 18:54:10 - Flanking windowsize in kb: 100 2024/02/15 18:54:10 - Variants in flanking region 7:156638803-156838803 : 1119 2024/02/15 18:54:10 - Extracted 1119 variants in the regions. 2024/02/15 18:54:10 Finished extracting variants in the flanking regions. 2024/02/15 18:54:10 Start to extract lead variants...v3.4.40 2024/02/15 18:54:10 -Current Dataframe shape : 1119 x 12 ; Memory usage: 20.64 MB 2024/02/15 18:54:10 -Processing 1119 variants... 2024/02/15 18:54:10 -Significance threshold : 1e-05 2024/02/15 18:54:10 -Sliding window size: 500 kb 2024/02/15 18:54:10 -Using P for extracting lead variants... 2024/02/15 18:54:10 -Found 4 significant variants in total... 2024/02/15 18:54:10 -Identified 1 lead variants! 2024/02/15 18:54:10 Finished extracting lead variants.
Out[9]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | N | DIRECTION | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
694190 | 7:156793450_G_GA | 7 | 156793450 | GA | G | 0.1168 | 0.0838 | 0.0167 | 5.686000e-07 | 191764 | ++++ | 1960399 |
In [11]:
Copied!
mysumstats.plot_mqq(mode="r",
region=(7,156538803,157538803),
region_ref2 = "7:156793450_G_GA",
vcf_path=gl.get_path("1kg_eas_hg19"),
anno=True,
anno_set=["7:156793450_G_GA","7:157038803_A_G"]
)
mysumstats.plot_mqq(mode="r",
region=(7,156538803,157538803),
region_ref2 = "7:156793450_G_GA",
vcf_path=gl.get_path("1kg_eas_hg19"),
anno=True,
anno_set=["7:156793450_G_GA","7:157038803_A_G"]
)
2024/02/15 18:54:42 Start to create MQQ plot...v3.4.40: 2024/02/15 18:54:42 -Genomic coordinates version: 19... 2024/02/15 18:54:42 -Genome-wide significance level to plot is set to 5e-08 ... 2024/02/15 18:54:42 -Raw input contains 707780 variants... 2024/02/15 18:54:42 -MQQ plot layout mode is : r 2024/02/15 18:54:42 -Region to plot : chr7:156538803-157538803. 2024/02/15 18:54:42 -Checking prefix for chromosomes in vcf files... 2024/02/15 18:54:42 -No prefix for chromosomes in the VCF files. 2024/02/15 18:54:42 -Extract SNPs in region : chr7:156538803-157538803... 2024/02/15 18:54:42 -Extract SNPs in specified regions: 5831 2024/02/15 18:54:42 Finished loading specified columns from the sumstats. 2024/02/15 18:54:42 Start data conversion and sanity check: 2024/02/15 18:54:42 -Removed 0 variants with nan in CHR or POS column ... 2024/02/15 18:54:42 -Removed 0 variants with CHR <=0... 2024/02/15 18:54:42 -Removed 0 variants with nan in P column ... 2024/02/15 18:54:42 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... 2024/02/15 18:54:42 -Sumstats P values are being converted to -log10(P)... 2024/02/15 18:54:42 -Sanity check: 0 na/inf/-inf variants will be removed... 2024/02/15 18:54:42 -Converting data above cut line... 2024/02/15 18:54:42 -Maximum -log10(P) value is 7.948076083953893 . 2024/02/15 18:54:42 Finished data conversion and sanity check. 2024/02/15 18:54:42 Start to create MQQ plot with 5831 variants... 2024/02/15 18:54:42 -tabix will be used: /home/yunye/tools/bin/tabix 2024/02/15 18:54:42 Start to load reference genotype... 2024/02/15 18:54:42 -reference vcf path : /home/yunye/.gwaslab/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz 2024/02/15 18:54:44 -Retrieving index... 2024/02/15 18:54:44 -Ref variants in the region: 35278 2024/02/15 18:54:44 -Matching variants using POS, NEA, EA ... 2024/02/15 18:54:44 -Calculating Rsq... 2024/02/15 18:54:44 -Reference variant ID: 7:156793450_G_GA - 694190 2024/02/15 18:54:44 -Calculating Rsq... 2024/02/15 18:54:44 Finished loading reference genotype successfully! 2024/02/15 18:54:44 -Creating background plot... 2024/02/15 18:54:45 -Extracting lead variant... 2024/02/15 18:54:45 -Reference variant ID: 7:156793450_G_GA - 1364 2024/02/15 18:54:45 -Loading gtf files from:default
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']
2024/02/15 18:55:13 -plotting gene track.. 2024/02/15 18:55:13 -Finished plotting gene track.. 2024/02/15 18:55:13 Finished creating MQQ plot successfully! 2024/02/15 18:55:13 Start to extract variants for annotation... 2024/02/15 18:55:13 -Found 2 specified variants to annotate... 2024/02/15 18:55:13 Finished extracting variants for annotation... 2024/02/15 18:55:13 Start to process figure arts. 2024/02/15 18:55:13 -Processing X labels... 2024/02/15 18:55:13 -Processing Y labels... 2024/02/15 18:55:13 -Processing Y tick lables... 2024/02/15 18:55:13 -Processing Y labels... 2024/02/15 18:55:13 -Processing color bar... 2024/02/15 18:55:13 -Processing lines... 2024/02/15 18:55:13 Finished processing figure arts. 2024/02/15 18:55:13 Start to annotate variants... 2024/02/15 18:55:13 -Annotating using column CHR:POS... 2024/02/15 18:55:13 -Adjusting text positions with repel_force=0.03... 2024/02/15 18:55:13 Finished annotating variants. 2024/02/15 18:55:13 Start to save figure... 2024/02/15 18:55:13 -Skip saving figure! 2024/02/15 18:55:13 Finished saving figure... 2024/02/15 18:55:14 Finished creating plot successfully!
Out[11]:
(<Figure size 3000x2000 with 5 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)
Create stacked regional plot¶
In [13]:
Copied!
gl.plot_stacked_mqq(objects=[mysumstats,mysumstats],
vcfs=[gl.get_path("1kg_eas_hg19"),gl.get_path("1kg_eur_hg19")],
region=(7,156538803,157538803),
mode="r",
build="19",
anno=True,
anno_style="tight",
anno_set=["7:156793450_G_GA","7:157038803_A_G"],
titles=["EAS","EUR"],
title_args={"size":20},
anno_args={"rotation":0})
gl.plot_stacked_mqq(objects=[mysumstats,mysumstats],
vcfs=[gl.get_path("1kg_eas_hg19"),gl.get_path("1kg_eur_hg19")],
region=(7,156538803,157538803),
mode="r",
build="19",
anno=True,
anno_style="tight",
anno_set=["7:156793450_G_GA","7:157038803_A_G"],
titles=["EAS","EUR"],
title_args={"size":20},
anno_args={"rotation":0})
2024/02/15 18:55:14 Start to create stacked mqq plot by iteratively calling plot_mqq: 2024/02/15 18:55:14 Start to create MQQ plot...v3.4.40: 2024/02/15 18:55:14 -Genomic coordinates version: 19... 2024/02/15 18:55:14 -Genome-wide significance level to plot is set to 5e-08 ... 2024/02/15 18:55:14 -Raw input contains 707780 variants... 2024/02/15 18:55:14 -MQQ plot layout mode is : r 2024/02/15 18:55:14 -Region to plot : chr7:156538803-157538803. 2024/02/15 18:55:14 -Checking prefix for chromosomes in vcf files... 2024/02/15 18:55:14 -No prefix for chromosomes in the VCF files. 2024/02/15 18:55:14 -Extract SNPs in region : chr7:156538803-157538803... 2024/02/15 18:55:14 -Extract SNPs in specified regions: 5831 2024/02/15 18:55:14 Finished loading specified columns from the sumstats. 2024/02/15 18:55:14 Start data conversion and sanity check: 2024/02/15 18:55:14 -Sanity check will be skipped. 2024/02/15 18:55:14 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... 2024/02/15 18:55:14 -Sumstats P values are being converted to -log10(P)... 2024/02/15 18:55:14 -Sanity check: 0 na/inf/-inf variants will be removed... 2024/02/15 18:55:14 -Converting data above cut line... 2024/02/15 18:55:14 -Maximum -log10(P) value is 7.948076083953893 . 2024/02/15 18:55:14 Finished data conversion and sanity check. 2024/02/15 18:55:14 Start to create MQQ plot with 5831 variants... 2024/02/15 18:55:14 -tabix will be used: /home/yunye/tools/bin/tabix 2024/02/15 18:55:14 Start to load reference genotype... 2024/02/15 18:55:14 -reference vcf path : /home/yunye/.gwaslab/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz 2024/02/15 18:55:16 -Retrieving index... 2024/02/15 18:55:16 -Ref variants in the region: 35278 2024/02/15 18:55:16 -Matching variants using POS, NEA, EA ... 2024/02/15 18:55:16 -Calculating Rsq... 2024/02/15 18:55:16 Finished loading reference genotype successfully! 2024/02/15 18:55:16 -Creating background plot... 2024/02/15 18:55:17 -Extracting lead variant... 2024/02/15 18:55:17 -Loading gtf files from:default
INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']
2024/02/15 18:55:45 -plotting gene track.. 2024/02/15 18:55:46 -Finished plotting gene track.. 2024/02/15 18:55:46 Finished creating MQQ plot successfully! 2024/02/15 18:55:46 Start to extract variants for annotation... 2024/02/15 18:55:46 -Found 2 specified variants to annotate... 2024/02/15 18:55:46 Finished extracting variants for annotation... 2024/02/15 18:55:46 Start to process figure arts. 2024/02/15 18:55:46 -Processing X labels... 2024/02/15 18:55:46 -Processing Y labels... 2024/02/15 18:55:46 -Processing Y tick lables... 2024/02/15 18:55:46 -Processing Y labels... 2024/02/15 18:55:46 -Processing color bar... 2024/02/15 18:55:46 -Processing lines... 2024/02/15 18:55:46 Finished processing figure arts. 2024/02/15 18:55:46 Start to annotate variants... 2024/02/15 18:55:46 -Annotating using column CHR:POS... 2024/02/15 18:55:46 -Adjusting text positions with repel_force=0.03... 2024/02/15 18:55:46 -Auto-adjusting text positions... 2024/02/15 18:55:46 Finished annotating variants. 2024/02/15 18:55:46 Start to save figure... 2024/02/15 18:55:46 -Skip saving figure! 2024/02/15 18:55:46 Finished saving figure... 2024/02/15 18:55:46 Start to create MQQ plot...v3.4.40: 2024/02/15 18:55:46 -Genomic coordinates version: 19... 2024/02/15 18:55:46 -Genome-wide significance level to plot is set to 5e-08 ... 2024/02/15 18:55:46 -Raw input contains 707780 variants... 2024/02/15 18:55:46 -MQQ plot layout mode is : r 2024/02/15 18:55:46 -Region to plot : chr7:156538803-157538803. 2024/02/15 18:55:46 -Checking prefix for chromosomes in vcf files... 2024/02/15 18:55:46 -No prefix for chromosomes in the VCF files. 2024/02/15 18:55:46 -Extract SNPs in region : chr7:156538803-157538803... 2024/02/15 18:55:46 -Extract SNPs in specified regions: 5831 2024/02/15 18:55:46 Finished loading specified columns from the sumstats. 2024/02/15 18:55:46 Start data conversion and sanity check: 2024/02/15 18:55:46 -Sanity check will be skipped. 2024/02/15 18:55:46 -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed... 2024/02/15 18:55:46 -Sumstats P values are being converted to -log10(P)... 2024/02/15 18:55:46 -Sanity check: 0 na/inf/-inf variants will be removed... 2024/02/15 18:55:46 -Converting data above cut line... 2024/02/15 18:55:46 -Maximum -log10(P) value is 7.948076083953893 . 2024/02/15 18:55:46 Finished data conversion and sanity check. 2024/02/15 18:55:46 Start to create MQQ plot with 5831 variants... 2024/02/15 18:55:46 -tabix will be used: /home/yunye/tools/bin/tabix 2024/02/15 18:55:46 Start to load reference genotype... 2024/02/15 18:55:46 -reference vcf path : /home/yunye/.gwaslab/EUR.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz 2024/02/15 18:55:48 -Retrieving index... 2024/02/15 18:55:48 -Ref variants in the region: 35278 2024/02/15 18:55:48 -Matching variants using POS, NEA, EA ... 2024/02/15 18:55:49 -Calculating Rsq... 2024/02/15 18:55:49 Finished loading reference genotype successfully! 2024/02/15 18:55:49 -Creating background plot... 2024/02/15 18:55:49 -Extracting lead variant... 2024/02/15 18:55:49 Finished creating MQQ plot successfully! 2024/02/15 18:55:49 Start to extract variants for annotation... 2024/02/15 18:55:49 -Found 2 specified variants to annotate... 2024/02/15 18:55:49 Finished extracting variants for annotation... 2024/02/15 18:55:49 Start to process figure arts. 2024/02/15 18:55:49 -Processing X labels... 2024/02/15 18:55:49 -Processing Y labels... 2024/02/15 18:55:49 -Processing Y tick lables... 2024/02/15 18:55:49 -Processing Y labels... 2024/02/15 18:55:49 -Processing lines... 2024/02/15 18:55:49 Finished processing figure arts. 2024/02/15 18:55:49 Start to annotate variants... 2024/02/15 18:55:49 -Annotating using column CHR:POS... 2024/02/15 18:55:49 -Adjusting text positions with repel_force=0.03... 2024/02/15 18:55:49 -Auto-adjusting text positions... 2024/02/15 18:55:49 Finished annotating variants. 2024/02/15 18:55:49 Start to save figure... 2024/02/15 18:55:49 -Skip saving figure! 2024/02/15 18:55:49 Finished saving figure... 2024/02/15 18:55:49 Start to save figure... 2024/02/15 18:55:49 -Skip saving figure! 2024/02/15 18:55:49 Finished saving figure... 2024/02/15 18:55:49 Finished creating stacked mqq plot by iteratively calling plot_mqq.
Out[13]:
(<Figure size 3200x2400 with 6 Axes>, <gwaslab.g_Log.Log at 0x7f09e31fe3d0>)