Regional plot¶

load gwaslab¶

In [1]:

            
                Copied!
                
import sys
sys.path.insert(0,"/home/yunye/work/gwaslab/src")
import gwaslab as gl
import sys
sys.path.insert(0,"/home/yunye/work/gwaslab/src")
import gwaslab as gl

download sample data¶

In [2]:

            
                Copied!
                
#!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/
#!wget -O t2d_bbj.txt.gz http://jenger.riken.jp/14/

load sumstats into gwaslab.Sumstats¶

In [3]:

            
                Copied!
                
                    
                    
                
                

        
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
             snpid="SNP",
             chrom="CHR",
             pos="POS",
             ea="ALT",
             nea="REF",
             neaf="Frq",
             beta="BETA",
             se="SE",
             p="P",
             direction="Dir",
             build="19",
             n="N")
mysumstats.filter_value('CHR =="7"',inplace=True)
mysumstats.basic_check(verbose = False)
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
             snpid="SNP",
             chrom="CHR",
             pos="POS",
             ea="ALT",
             nea="REF",
             neaf="Frq",
             beta="BETA",
             se="SE",
             p="P",
             direction="Dir",
             build="19",
             n="N")
mysumstats.filter_value('CHR =="7"',inplace=True)
mysumstats.basic_check(verbose = False)

2024/02/15 18:52:34 GWASLab v3.4.40 https://cloufield.github.io/gwaslab/
2024/02/15 18:52:34 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com
2024/02/15 18:52:34 Start to initialize gl.Sumstats from file :t2d_bbj.txt.gz
2024/02/15 18:52:50  -Reading columns          : P,BETA,Frq,SE,N,POS,REF,Dir,SNP,ALT,CHR
2024/02/15 18:52:50  -Renaming columns to      : P,BETA,EAF,SE,N,POS,NEA,DIRECTION,SNPID,EA,CHR
2024/02/15 18:52:50  -Current Dataframe shape : 12557761  x  11
2024/02/15 18:52:50  -Initiating a status column: STATUS ...
2024/02/15 18:52:50  -Genomic coordinates are based on GRCh37/hg19...
2024/02/15 18:52:52  -NEAF is specified...
2024/02/15 18:52:52  -Checking if 0<= NEAF <=1 ...
2024/02/15 18:52:53  -Converted NEAF to EAF.
2024/02/15 18:52:53  -Removed 0 variants with bad NEAF.
2024/02/15 18:52:53 Start to reorder the columns...v3.4.40
2024/02/15 18:52:53  -Current Dataframe shape : 12557761 x 12 ; Memory usage: 1100.88 MB
2024/02/15 18:52:53  -Reordering columns to    : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS
2024/02/15 18:52:53 Finished reordering the columns.
2024/02/15 18:52:53  -Column  : SNPID  CHR    POS   EA       NEA      EAF     BETA    SE      P       N     DIRECTION STATUS  
2024/02/15 18:52:53  -DType   : object string int64 category category float64 float64 float64 float64 int64 object    category
2024/02/15 18:52:53  -Verified: T      F      T     T        T        T       T       T       T       T     T         T       
2024/02/15 18:52:53  #WARNING! Columns with possibly incompatible dtypes: CHR
2024/02/15 18:52:54  -Current Dataframe memory usage: 1100.88 MB
2024/02/15 18:52:54 Finished loading data successfully!
2024/02/15 18:52:54 Start filtering values by condition: CHR =="7"
2024/02/15 18:52:54  -Removing 11849981 variants not meeting the conditions: CHR =="7"
2024/02/15 18:52:54 Finished filtering values.

Create Manhattan plot with sumstats on a single chromosome¶

In [4]:

            
                Copied!
                
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl")
mysumstats.plot_mqq(mode="m",anno="GENENAME",anno_source="ensembl")

2024/02/15 18:53:01 Start to create MQQ plot...v3.4.40:
2024/02/15 18:53:01  -Genomic coordinates version: 19...
2024/02/15 18:53:01  -Genome-wide significance level to plot is set to 5e-08 ...
2024/02/15 18:53:01  -Raw input contains 707780 variants...
2024/02/15 18:53:01  -MQQ plot layout mode is : m
2024/02/15 18:53:01 Finished loading specified columns from the sumstats.
2024/02/15 18:53:01 Start data conversion and sanity check:
2024/02/15 18:53:01  -Removed 0 variants with nan in CHR or POS column ...
2024/02/15 18:53:01  -Removed 0 variants with CHR <=0...
2024/02/15 18:53:01  -Removed 0 variants with nan in P column ...
2024/02/15 18:53:01  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
2024/02/15 18:53:02  -Sumstats P values are being converted to -log10(P)...
2024/02/15 18:53:02  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/02/15 18:53:02  -Converting data above cut line...
2024/02/15 18:53:02  -Maximum -log10(P) value is 73.38711023071251 .
2024/02/15 18:53:02 Finished data conversion and sanity check.
2024/02/15 18:53:02 Start to create MQQ plot with 707780 variants...
2024/02/15 18:53:02  -Creating background plot...
2024/02/15 18:53:03 Finished creating MQQ plot successfully!
2024/02/15 18:53:03 Start to extract variants for annotation...
2024/02/15 18:53:03  -Found 7 significant variants with a sliding window size of 500 kb...
2024/02/15 18:53:03 Start to annotate variants with nearest gene name(s)...
2024/02/15 18:53:03  -Assigning Gene name using ensembl_hg19_gtf for protein coding genes
2024/02/15 18:53:03 Finished annotating variants with nearest gene name(s) successfully!
2024/02/15 18:53:03 Finished extracting variants for annotation...
2024/02/15 18:53:03 Start to process figure arts.
2024/02/15 18:53:03  -Processing X ticks...
2024/02/15 18:53:03  -Processing X labels...
2024/02/15 18:53:03  -Processing Y labels...
2024/02/15 18:53:03  -Processing Y tick lables...
2024/02/15 18:53:03  -Processing Y labels...
2024/02/15 18:53:03  -Processing lines...
2024/02/15 18:53:03 Finished processing figure arts.
2024/02/15 18:53:03 Start to annotate variants...
2024/02/15 18:53:03  -Annotating using column GENENAME...
2024/02/15 18:53:03  -Adjusting text positions with repel_force=0.03...
2024/02/15 18:53:03 Finished annotating variants.
2024/02/15 18:53:03 Start to save figure...
2024/02/15 18:53:03  -Skip saving figure!
2024/02/15 18:53:03 Finished saving figure...
2024/02/15 18:53:03 Finished creating plot successfully!

Out[4]:

(<Figure size 3000x1000 with 1 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)

Check lead variants¶

In [5]:

            
                Copied!
                
mysumstats.get_lead()
mysumstats.get_lead()

2024/02/15 18:53:07 Start to extract lead variants...v3.4.40
2024/02/15 18:53:07  -Current Dataframe shape : 707780 x 12 ; Memory usage: 73.87 MB
2024/02/15 18:53:07  -Processing 707780 variants...
2024/02/15 18:53:07  -Significance threshold : 5e-08
2024/02/15 18:53:07  -Sliding window size: 500  kb
2024/02/15 18:53:07  -Using P for extracting lead variants...
2024/02/15 18:53:07  -Found 1077 significant variants in total...
2024/02/15 18:53:08  -Identified 7 lead variants!
2024/02/15 18:53:08 Finished extracting lead variants.

Out[5]:

	SNPID	CHR	POS	EA	NEA	EAF	BETA	SE	P	N	DIRECTION	STATUS
77576	7:13888699_G_C	7	13888699	G	C	0.5680	0.0562	0.0094	2.507000e-09	191764	++++	1960099
83154	7:14898282_C_T	7	14898282	C	T	0.6012	0.0617	0.0088	2.336000e-12	191764	++++	1960099
229433	7:44174857_T_G	7	44174857	G	T	0.5985	-0.0640	0.0093	5.325000e-12	191764	----	1960099
335366	7:69406661_A_T	7	69406661	T	A	0.1981	-0.0900	0.0111	4.871000e-16	191764	----	1960099
568451	7:127253550_C_T	7	127253550	C	T	0.9081	0.2761	0.0152	4.101000e-74	191764	++++	1960099
579917	7:130025713_G_A	7	130025713	G	A	0.9530	-0.1365	0.0230	3.068000e-09	191764	----	1960099
695434	7:157038803_A_G	7	157038803	G	A	0.4626	-0.0502	0.0088	1.127000e-08	191764	----	1960099

Create a regional plot with no additional information¶

In [6]:

            
                Copied!
                
mysumstats.plot_mqq(region=(7,156538803,157538803))
mysumstats.plot_mqq(region=(7,156538803,157538803))

2024/02/15 18:53:08 Start to create MQQ plot...v3.4.40:
2024/02/15 18:53:08  -Genomic coordinates version: 19...
2024/02/15 18:53:08  -Genome-wide significance level to plot is set to 5e-08 ...
2024/02/15 18:53:08  -Raw input contains 707780 variants...
2024/02/15 18:53:08  -MQQ plot layout mode is : mqq
2024/02/15 18:53:08  -Region to plot : chr7:156538803-157538803.
2024/02/15 18:53:08  -Extract SNPs in region : chr7:156538803-157538803...
2024/02/15 18:53:08  -Extract SNPs in specified regions: 5831
2024/02/15 18:53:08 Finished loading specified columns from the sumstats.
2024/02/15 18:53:08 Start data conversion and sanity check:
2024/02/15 18:53:08  -Removed 0 variants with nan in CHR or POS column ...
2024/02/15 18:53:08  -Removed 0 variants with CHR <=0...
2024/02/15 18:53:08  -Removed 0 variants with nan in P column ...
2024/02/15 18:53:08  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
2024/02/15 18:53:08  -Sumstats P values are being converted to -log10(P)...
2024/02/15 18:53:08  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/02/15 18:53:08  -Converting data above cut line...
2024/02/15 18:53:08  -Maximum -log10(P) value is 7.948076083953893 .
2024/02/15 18:53:08 Finished data conversion and sanity check.
2024/02/15 18:53:08 Start to create MQQ plot with 5831 variants...
2024/02/15 18:53:08  -Creating background plot...
2024/02/15 18:53:08 Finished creating MQQ plot successfully!
2024/02/15 18:53:08 Start to extract variants for annotation...
2024/02/15 18:53:08  -Found 1 significant variants with a sliding window size of 500 kb...
2024/02/15 18:53:08 Finished extracting variants for annotation...
2024/02/15 18:53:08 Start to process figure arts.
2024/02/15 18:53:08  -Processing X labels...
2024/02/15 18:53:08  -Processing Y labels...
2024/02/15 18:53:08  -Processing Y tick lables...
2024/02/15 18:53:08  -Processing Y labels...
2024/02/15 18:53:08  -Processing lines...
2024/02/15 18:53:08 Finished processing figure arts.
2024/02/15 18:53:08 Start to annotate variants...
2024/02/15 18:53:08  -Skip annotating
2024/02/15 18:53:08 Finished annotating variants.
2024/02/15 18:53:08 Start to create QQ plot with 5831 variants:
2024/02/15 18:53:08  -Plotting all variants...
2024/02/15 18:53:08  -Expected range of P: (0,1.0)
2024/02/15 18:53:08  -Lambda GC (MLOG10P mode) at 0.5 is   1.71914
2024/02/15 18:53:08  -Processing Y tick lables...
2024/02/15 18:53:08 Finished creating QQ plot successfully!
2024/02/15 18:53:08 Start to save figure...
2024/02/15 18:53:08  -Skip saving figure!
2024/02/15 18:53:08 Finished saving figure...
2024/02/15 18:53:08 Finished creating plot successfully!

Out[6]:

(<Figure size 3000x1000 with 2 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)

Create a regional plot with gene track¶

In [7]:

            
                Copied!
                
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803))
mysumstats.plot_mqq(mode="r",region=(7,156538803,157538803))

2024/02/15 18:53:08 Start to create MQQ plot...v3.4.40:
2024/02/15 18:53:08  -Genomic coordinates version: 19...
2024/02/15 18:53:08  -Genome-wide significance level to plot is set to 5e-08 ...
2024/02/15 18:53:08  -Raw input contains 707780 variants...
2024/02/15 18:53:08  -MQQ plot layout mode is : r
2024/02/15 18:53:08  -Region to plot : chr7:156538803-157538803.
2024/02/15 18:53:08  -Extract SNPs in region : chr7:156538803-157538803...
2024/02/15 18:53:08  -Extract SNPs in specified regions: 5831
2024/02/15 18:53:08 Finished loading specified columns from the sumstats.
2024/02/15 18:53:08 Start data conversion and sanity check:
2024/02/15 18:53:08  -Removed 0 variants with nan in CHR or POS column ...
2024/02/15 18:53:08  -Removed 0 variants with CHR <=0...
2024/02/15 18:53:08  -Removed 0 variants with nan in P column ...
2024/02/15 18:53:08  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
2024/02/15 18:53:08  -Sumstats P values are being converted to -log10(P)...
2024/02/15 18:53:08  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/02/15 18:53:08  -Converting data above cut line...
2024/02/15 18:53:08  -Maximum -log10(P) value is 7.948076083953893 .
2024/02/15 18:53:08 Finished data conversion and sanity check.
2024/02/15 18:53:08 Start to create MQQ plot with 5831 variants...
2024/02/15 18:53:08  -Creating background plot...
2024/02/15 18:53:08  -Extracting lead variant...
2024/02/15 18:53:09  -Loading gtf files from:default

INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']

2024/02/15 18:53:37  -plotting gene track..
2024/02/15 18:53:37  -Finished plotting gene track..
2024/02/15 18:53:37 Finished creating MQQ plot successfully!
2024/02/15 18:53:37 Start to extract variants for annotation...
2024/02/15 18:53:37  -Found 1 significant variants with a sliding window size of 500 kb...
2024/02/15 18:53:37 Finished extracting variants for annotation...
2024/02/15 18:53:37 Start to process figure arts.
2024/02/15 18:53:37  -Processing X labels...
2024/02/15 18:53:37  -Processing Y labels...
2024/02/15 18:53:37  -Processing Y tick lables...
2024/02/15 18:53:37  -Processing Y labels...
2024/02/15 18:53:37  -Processing lines...
2024/02/15 18:53:37 Finished processing figure arts.
2024/02/15 18:53:37 Start to annotate variants...
2024/02/15 18:53:37  -Skip annotating
2024/02/15 18:53:37 Finished annotating variants.
2024/02/15 18:53:37 Start to save figure...
2024/02/15 18:53:37  -Skip saving figure!
2024/02/15 18:53:37 Finished saving figure...
2024/02/15 18:53:37 Finished creating plot successfully!

Out[7]:

(<Figure size 3000x2000 with 3 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)

Create regional plot with gene track and LD information¶

In [8]:

            
                Copied!
                
mysumstats.plot_mqq(mode="r",
                    region=(7,156538803,157538803),                    
                    vcf_path=gl.get_path("1kg_eas_hg19")
                   )
mysumstats.plot_mqq(mode="r",
                    region=(7,156538803,157538803),                    
                    vcf_path=gl.get_path("1kg_eas_hg19")
                   )

2024/02/15 18:53:37 Start to create MQQ plot...v3.4.40:
2024/02/15 18:53:37  -Genomic coordinates version: 19...
2024/02/15 18:53:37  -Genome-wide significance level to plot is set to 5e-08 ...
2024/02/15 18:53:37  -Raw input contains 707780 variants...
2024/02/15 18:53:37  -MQQ plot layout mode is : r
2024/02/15 18:53:37  -Region to plot : chr7:156538803-157538803.
2024/02/15 18:53:37  -Checking prefix for chromosomes in vcf files...
2024/02/15 18:53:38  -No prefix for chromosomes in the VCF files.
2024/02/15 18:53:38  -Extract SNPs in region : chr7:156538803-157538803...
2024/02/15 18:53:38  -Extract SNPs in specified regions: 5831
2024/02/15 18:53:38 Finished loading specified columns from the sumstats.
2024/02/15 18:53:38 Start data conversion and sanity check:
2024/02/15 18:53:38  -Removed 0 variants with nan in CHR or POS column ...
2024/02/15 18:53:38  -Removed 0 variants with CHR <=0...
2024/02/15 18:53:38  -Removed 0 variants with nan in P column ...
2024/02/15 18:53:38  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
2024/02/15 18:53:38  -Sumstats P values are being converted to -log10(P)...
2024/02/15 18:53:38  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/02/15 18:53:38  -Converting data above cut line...
2024/02/15 18:53:38  -Maximum -log10(P) value is 7.948076083953893 .
2024/02/15 18:53:38 Finished data conversion and sanity check.
2024/02/15 18:53:38 Start to create MQQ plot with 5831 variants...
2024/02/15 18:53:38  -tabix will be used: /home/yunye/tools/bin/tabix
2024/02/15 18:53:38 Start to load reference genotype...
2024/02/15 18:53:38  -reference vcf path : /home/yunye/.gwaslab/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz
2024/02/15 18:53:40  -Retrieving index...
2024/02/15 18:53:40  -Ref variants in the region: 35278
2024/02/15 18:53:40  -Matching variants using POS, NEA, EA ...
2024/02/15 18:53:40  -Calculating Rsq...
2024/02/15 18:53:40 Finished loading reference genotype successfully!
2024/02/15 18:53:40  -Creating background plot...
2024/02/15 18:53:40  -Extracting lead variant...
2024/02/15 18:53:40  -Loading gtf files from:default

INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']

2024/02/15 18:54:09  -plotting gene track..
2024/02/15 18:54:09  -Finished plotting gene track..
2024/02/15 18:54:09 Finished creating MQQ plot successfully!
2024/02/15 18:54:09 Start to extract variants for annotation...
2024/02/15 18:54:09  -Found 1 significant variants with a sliding window size of 500 kb...
2024/02/15 18:54:09 Finished extracting variants for annotation...
2024/02/15 18:54:09 Start to process figure arts.
2024/02/15 18:54:09  -Processing X labels...
2024/02/15 18:54:09  -Processing Y labels...
2024/02/15 18:54:09  -Processing Y tick lables...
2024/02/15 18:54:09  -Processing Y labels...
2024/02/15 18:54:09  -Processing color bar...
2024/02/15 18:54:09  -Processing lines...
2024/02/15 18:54:09 Finished processing figure arts.
2024/02/15 18:54:09 Start to annotate variants...
2024/02/15 18:54:09  -Skip annotating
2024/02/15 18:54:09 Finished annotating variants.
2024/02/15 18:54:09 Start to save figure...
2024/02/15 18:54:09  -Skip saving figure!
2024/02/15 18:54:09 Finished saving figure...
2024/02/15 18:54:09 Finished creating plot successfully!

Out[8]:

(<Figure size 3000x2000 with 4 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)

Create regional plot with two reference variants¶

In [9]:

            
                Copied!
                
mysumstats.filter_flanking_by_chrpos([(7,156738803)],windowsizekb=100).get_lead(sig_level=1e-5)
mysumstats.filter_flanking_by_chrpos([(7,156738803)],windowsizekb=100).get_lead(sig_level=1e-5)

2024/02/15 18:54:10 Start to extract variants in the flanking regions using CHR and POS...
2024/02/15 18:54:10  - Central positions: [(7, 156738803)]
2024/02/15 18:54:10  - Flanking windowsize in kb: 100
2024/02/15 18:54:10  - Variants in flanking region 7:156638803-156838803 : 1119
2024/02/15 18:54:10  - Extracted 1119 variants in the regions.
2024/02/15 18:54:10 Finished extracting variants in the flanking regions.
2024/02/15 18:54:10 Start to extract lead variants...v3.4.40
2024/02/15 18:54:10  -Current Dataframe shape : 1119 x 12 ; Memory usage: 20.64 MB
2024/02/15 18:54:10  -Processing 1119 variants...
2024/02/15 18:54:10  -Significance threshold : 1e-05
2024/02/15 18:54:10  -Sliding window size: 500  kb
2024/02/15 18:54:10  -Using P for extracting lead variants...
2024/02/15 18:54:10  -Found 4 significant variants in total...
2024/02/15 18:54:10  -Identified 1 lead variants!
2024/02/15 18:54:10 Finished extracting lead variants.

Out[9]:

	SNPID	CHR	POS	EA	NEA	EAF	BETA	SE	P	N	DIRECTION	STATUS
694190	7:156793450_G_GA	7	156793450	GA	G	0.1168	0.0838	0.0167	5.686000e-07	191764	++++	1960399

In [11]:

            
                Copied!
                
                    
                    
                
                

        
mysumstats.plot_mqq(mode="r",
                    region=(7,156538803,157538803),      
                    region_ref2 = "7:156793450_G_GA", 
                    vcf_path=gl.get_path("1kg_eas_hg19"),
                    anno=True,
                    anno_set=["7:156793450_G_GA","7:157038803_A_G"]
                   )
mysumstats.plot_mqq(mode="r",
                    region=(7,156538803,157538803),      
                    region_ref2 = "7:156793450_G_GA", 
                    vcf_path=gl.get_path("1kg_eas_hg19"),
                    anno=True,
                    anno_set=["7:156793450_G_GA","7:157038803_A_G"]
                   )

2024/02/15 18:54:42 Start to create MQQ plot...v3.4.40:
2024/02/15 18:54:42  -Genomic coordinates version: 19...
2024/02/15 18:54:42  -Genome-wide significance level to plot is set to 5e-08 ...
2024/02/15 18:54:42  -Raw input contains 707780 variants...
2024/02/15 18:54:42  -MQQ plot layout mode is : r
2024/02/15 18:54:42  -Region to plot : chr7:156538803-157538803.
2024/02/15 18:54:42  -Checking prefix for chromosomes in vcf files...
2024/02/15 18:54:42  -No prefix for chromosomes in the VCF files.
2024/02/15 18:54:42  -Extract SNPs in region : chr7:156538803-157538803...
2024/02/15 18:54:42  -Extract SNPs in specified regions: 5831
2024/02/15 18:54:42 Finished loading specified columns from the sumstats.
2024/02/15 18:54:42 Start data conversion and sanity check:
2024/02/15 18:54:42  -Removed 0 variants with nan in CHR or POS column ...
2024/02/15 18:54:42  -Removed 0 variants with CHR <=0...
2024/02/15 18:54:42  -Removed 0 variants with nan in P column ...
2024/02/15 18:54:42  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
2024/02/15 18:54:42  -Sumstats P values are being converted to -log10(P)...
2024/02/15 18:54:42  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/02/15 18:54:42  -Converting data above cut line...
2024/02/15 18:54:42  -Maximum -log10(P) value is 7.948076083953893 .
2024/02/15 18:54:42 Finished data conversion and sanity check.
2024/02/15 18:54:42 Start to create MQQ plot with 5831 variants...
2024/02/15 18:54:42  -tabix will be used: /home/yunye/tools/bin/tabix
2024/02/15 18:54:42 Start to load reference genotype...
2024/02/15 18:54:42  -reference vcf path : /home/yunye/.gwaslab/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz
2024/02/15 18:54:44  -Retrieving index...
2024/02/15 18:54:44  -Ref variants in the region: 35278
2024/02/15 18:54:44  -Matching variants using POS, NEA, EA ...
2024/02/15 18:54:44  -Calculating Rsq...
2024/02/15 18:54:44  -Reference variant ID: 7:156793450_G_GA - 694190
2024/02/15 18:54:44  -Calculating Rsq...
2024/02/15 18:54:44 Finished loading reference genotype successfully!
2024/02/15 18:54:44  -Creating background plot...
2024/02/15 18:54:45  -Extracting lead variant...
2024/02/15 18:54:45  -Reference variant ID: 7:156793450_G_GA - 1364
2024/02/15 18:54:45  -Loading gtf files from:default

INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']

2024/02/15 18:55:13  -plotting gene track..
2024/02/15 18:55:13  -Finished plotting gene track..
2024/02/15 18:55:13 Finished creating MQQ plot successfully!
2024/02/15 18:55:13 Start to extract variants for annotation...
2024/02/15 18:55:13  -Found 2 specified variants to annotate...
2024/02/15 18:55:13 Finished extracting variants for annotation...
2024/02/15 18:55:13 Start to process figure arts.
2024/02/15 18:55:13  -Processing X labels...
2024/02/15 18:55:13  -Processing Y labels...
2024/02/15 18:55:13  -Processing Y tick lables...
2024/02/15 18:55:13  -Processing Y labels...
2024/02/15 18:55:13  -Processing color bar...
2024/02/15 18:55:13  -Processing lines...
2024/02/15 18:55:13 Finished processing figure arts.
2024/02/15 18:55:13 Start to annotate variants...
2024/02/15 18:55:13  -Annotating using column CHR:POS...
2024/02/15 18:55:13  -Adjusting text positions with repel_force=0.03...
2024/02/15 18:55:13 Finished annotating variants.
2024/02/15 18:55:13 Start to save figure...
2024/02/15 18:55:13  -Skip saving figure!
2024/02/15 18:55:13 Finished saving figure...
2024/02/15 18:55:14 Finished creating plot successfully!

Out[11]:

(<Figure size 3000x2000 with 5 Axes>, <gwaslab.g_Log.Log at 0x7f09e6a8f6a0>)

Create stacked regional plot¶

In [13]:

            
                Copied!
                
                    
                    
                
                

        
gl.plot_stacked_mqq(objects=[mysumstats,mysumstats],
                    vcfs=[gl.get_path("1kg_eas_hg19"),gl.get_path("1kg_eur_hg19")],
                    region=(7,156538803,157538803), 
                    mode="r",
                    build="19",
                    anno=True,
                    anno_style="tight",
                    anno_set=["7:156793450_G_GA","7:157038803_A_G"],
                    titles=["EAS","EUR"],
                    title_args={"size":20},
                    anno_args={"rotation":0})
gl.plot_stacked_mqq(objects=[mysumstats,mysumstats],
                    vcfs=[gl.get_path("1kg_eas_hg19"),gl.get_path("1kg_eur_hg19")],
                    region=(7,156538803,157538803), 
                    mode="r",
                    build="19",
                    anno=True,
                    anno_style="tight",
                    anno_set=["7:156793450_G_GA","7:157038803_A_G"],
                    titles=["EAS","EUR"],
                    title_args={"size":20},
                    anno_args={"rotation":0})

2024/02/15 18:55:14 Start to create stacked mqq plot by iteratively calling plot_mqq:
2024/02/15 18:55:14 Start to create MQQ plot...v3.4.40:
2024/02/15 18:55:14  -Genomic coordinates version: 19...
2024/02/15 18:55:14  -Genome-wide significance level to plot is set to 5e-08 ...
2024/02/15 18:55:14  -Raw input contains 707780 variants...
2024/02/15 18:55:14  -MQQ plot layout mode is : r
2024/02/15 18:55:14  -Region to plot : chr7:156538803-157538803.
2024/02/15 18:55:14  -Checking prefix for chromosomes in vcf files...
2024/02/15 18:55:14  -No prefix for chromosomes in the VCF files.
2024/02/15 18:55:14  -Extract SNPs in region : chr7:156538803-157538803...
2024/02/15 18:55:14  -Extract SNPs in specified regions: 5831
2024/02/15 18:55:14 Finished loading specified columns from the sumstats.
2024/02/15 18:55:14 Start data conversion and sanity check:
2024/02/15 18:55:14  -Sanity check will be skipped.
2024/02/15 18:55:14  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
2024/02/15 18:55:14  -Sumstats P values are being converted to -log10(P)...
2024/02/15 18:55:14  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/02/15 18:55:14  -Converting data above cut line...
2024/02/15 18:55:14  -Maximum -log10(P) value is 7.948076083953893 .
2024/02/15 18:55:14 Finished data conversion and sanity check.
2024/02/15 18:55:14 Start to create MQQ plot with 5831 variants...
2024/02/15 18:55:14  -tabix will be used: /home/yunye/tools/bin/tabix
2024/02/15 18:55:14 Start to load reference genotype...
2024/02/15 18:55:14  -reference vcf path : /home/yunye/.gwaslab/EAS.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz
2024/02/15 18:55:16  -Retrieving index...
2024/02/15 18:55:16  -Ref variants in the region: 35278
2024/02/15 18:55:16  -Matching variants using POS, NEA, EA ...
2024/02/15 18:55:16  -Calculating Rsq...
2024/02/15 18:55:16 Finished loading reference genotype successfully!
2024/02/15 18:55:16  -Creating background plot...
2024/02/15 18:55:17  -Extracting lead variant...
2024/02/15 18:55:17  -Loading gtf files from:default

INFO:root:Extracted GTF attributes: ['gene_id', 'gene_name', 'gene_biotype']

2024/02/15 18:55:45  -plotting gene track..
2024/02/15 18:55:46  -Finished plotting gene track..
2024/02/15 18:55:46 Finished creating MQQ plot successfully!
2024/02/15 18:55:46 Start to extract variants for annotation...
2024/02/15 18:55:46  -Found 2 specified variants to annotate...
2024/02/15 18:55:46 Finished extracting variants for annotation...
2024/02/15 18:55:46 Start to process figure arts.
2024/02/15 18:55:46  -Processing X labels...
2024/02/15 18:55:46  -Processing Y labels...
2024/02/15 18:55:46  -Processing Y tick lables...
2024/02/15 18:55:46  -Processing Y labels...
2024/02/15 18:55:46  -Processing color bar...
2024/02/15 18:55:46  -Processing lines...
2024/02/15 18:55:46 Finished processing figure arts.
2024/02/15 18:55:46 Start to annotate variants...
2024/02/15 18:55:46  -Annotating using column CHR:POS...
2024/02/15 18:55:46  -Adjusting text positions with repel_force=0.03...
2024/02/15 18:55:46  -Auto-adjusting text positions...
2024/02/15 18:55:46 Finished annotating variants.
2024/02/15 18:55:46 Start to save figure...
2024/02/15 18:55:46  -Skip saving figure!
2024/02/15 18:55:46 Finished saving figure...
2024/02/15 18:55:46 Start to create MQQ plot...v3.4.40:
2024/02/15 18:55:46  -Genomic coordinates version: 19...
2024/02/15 18:55:46  -Genome-wide significance level to plot is set to 5e-08 ...
2024/02/15 18:55:46  -Raw input contains 707780 variants...
2024/02/15 18:55:46  -MQQ plot layout mode is : r
2024/02/15 18:55:46  -Region to plot : chr7:156538803-157538803.
2024/02/15 18:55:46  -Checking prefix for chromosomes in vcf files...
2024/02/15 18:55:46  -No prefix for chromosomes in the VCF files.
2024/02/15 18:55:46  -Extract SNPs in region : chr7:156538803-157538803...
2024/02/15 18:55:46  -Extract SNPs in specified regions: 5831
2024/02/15 18:55:46 Finished loading specified columns from the sumstats.
2024/02/15 18:55:46 Start data conversion and sanity check:
2024/02/15 18:55:46  -Sanity check will be skipped.
2024/02/15 18:55:46  -Sanity check after conversion: 0 variants with P value outside of (0,1] will be removed...
2024/02/15 18:55:46  -Sumstats P values are being converted to -log10(P)...
2024/02/15 18:55:46  -Sanity check: 0 na/inf/-inf variants will be removed...
2024/02/15 18:55:46  -Converting data above cut line...
2024/02/15 18:55:46  -Maximum -log10(P) value is 7.948076083953893 .
2024/02/15 18:55:46 Finished data conversion and sanity check.
2024/02/15 18:55:46 Start to create MQQ plot with 5831 variants...
2024/02/15 18:55:46  -tabix will be used: /home/yunye/tools/bin/tabix
2024/02/15 18:55:46 Start to load reference genotype...
2024/02/15 18:55:46  -reference vcf path : /home/yunye/.gwaslab/EUR.ALL.split_norm_af.1kgp3v5.hg19.vcf.gz
2024/02/15 18:55:48  -Retrieving index...
2024/02/15 18:55:48  -Ref variants in the region: 35278
2024/02/15 18:55:48  -Matching variants using POS, NEA, EA ...
2024/02/15 18:55:49  -Calculating Rsq...
2024/02/15 18:55:49 Finished loading reference genotype successfully!
2024/02/15 18:55:49  -Creating background plot...
2024/02/15 18:55:49  -Extracting lead variant...
2024/02/15 18:55:49 Finished creating MQQ plot successfully!
2024/02/15 18:55:49 Start to extract variants for annotation...
2024/02/15 18:55:49  -Found 2 specified variants to annotate...
2024/02/15 18:55:49 Finished extracting variants for annotation...
2024/02/15 18:55:49 Start to process figure arts.
2024/02/15 18:55:49  -Processing X labels...
2024/02/15 18:55:49  -Processing Y labels...
2024/02/15 18:55:49  -Processing Y tick lables...
2024/02/15 18:55:49  -Processing Y labels...
2024/02/15 18:55:49  -Processing lines...
2024/02/15 18:55:49 Finished processing figure arts.
2024/02/15 18:55:49 Start to annotate variants...
2024/02/15 18:55:49  -Annotating using column CHR:POS...
2024/02/15 18:55:49  -Adjusting text positions with repel_force=0.03...
2024/02/15 18:55:49  -Auto-adjusting text positions...
2024/02/15 18:55:49 Finished annotating variants.
2024/02/15 18:55:49 Start to save figure...
2024/02/15 18:55:49  -Skip saving figure!
2024/02/15 18:55:49 Finished saving figure...
2024/02/15 18:55:49 Start to save figure...
2024/02/15 18:55:49  -Skip saving figure!
2024/02/15 18:55:49 Finished saving figure...
2024/02/15 18:55:49 Finished creating stacked mqq plot by iteratively calling plot_mqq.

Out[13]:

(<Figure size 3200x2400 with 6 Axes>, <gwaslab.g_Log.Log at 0x7f09e31fe3d0>)