Manhattan plot and QQ plot
GWASLab provides a customizable plotting function for Manhattan and Q-Q plots.
.plot_mqq()
A simple example
See other examples here.
Options
- Using P or MLOG10P
- Adjusting x axis
- Adjusting y axis
- Changing layout
- Annotation
- Adding lines
- Highlight loci and Pinpoint variants
- Colors and fonts
- MAF-stratified QQ plot.
- Changing titles
- Saving figures
By setting the options, you can create highly customized Manhattan plots and Q-Q plots.
A customized Manhattan and QQ plot
Plot layout
Option | DataType | Description | Default |
---|---|---|---|
mode |
mqq ,qqm ,qq ,m |
Determine the layout of manhattan plot and qq plot. mqq : left manhatan, right QQ plot qqm : left QQ plot, right Manhattan plot m : only Manhattan plot"qq" : only qq plot |
mqq |
mqqratio |
float |
width ratio of Manhattan plot and QQ plot | 3 |
Layout
Use MLOG10P for extreme P values
Option | DataType | Description | Default |
---|---|---|---|
scaled |
boolean |
By default, GWASLab uses P values for mqq plot. But you can set scaled=Ture to use MLOG10P to plot. |
False |
Variant with extreme P values
To plot the variant with extreme P values (P < 1e-300), you can use scaled=False
to create the plot with MLOG10P instead of raw P values. To calculate MLOG10P for extreme P values from BETA/SE or Z scores, you can use mysumstats.fill_data(to_fill=["MLOG10P"], extreme=True)
. For details, please refer to the "Extreme P values" section in https://cloufield.github.io/gwaslab/Conversion/.
X axis: Physical position or rank
Option | DataType | Description | Default |
---|---|---|---|
use_rank |
boolean |
If True, GWASLab will use position rank instead of the physical base-pair positions for x aixs. | False |
Note
If using rank, there will be no gap in the plot. If using base-pair positions, certain regions of the chromosome might be reflected in the plot like the heterochromatin.
Y axis: Skip "low" and shrink "high"
Option | DataType | Description | Default |
---|---|---|---|
skip |
float |
Sometimes it is not necessary to plot all variants, we can skip the variants with low -log10(P) values for plotting. For example, we can omit varints with -log10(P) lower than 3 from the plot by specifying skip=3 . Calculation of lambda GC won't be affected by this |
None |
cut |
float |
loci with extremly large -log10(P) value are very likely to dwarf other significant loci, so we want to scale down the -log10(P) for variants above a certain threshold. | None |
cutfactor |
float |
shrinkage factor | 10 |
cut_line_color |
float |
the color of the line above which y axis is rescaled. | 500 |
sig_level |
float |
genome-wide significance threshold | 5e-8 |
Auxiliary lines
Note
lambda GC calculation for QQ plot will not be affected by skip and cut. The calculation is conducted using all variants in the original dataset.
Annotation
Option | DataType | Description | Default |
---|---|---|---|
anno |
boolean or string or "GENENAME" |
If anno = True , variants will anotated with chr:pos; or string , the column name used for annotation; or "GENENAME" , automatically annotate nrearest gene names, using pyensembl. (remember to specify build , default is build="19" ) |
False |
anno_set |
list |
If you want to annotate only a few specific variants, you can simply provide a list of SNPIDs or rsIDs for annotation. If None, the variants to annotate will be selected automatically using a sliding window with windowsize=500 kb. |
None |
repel_force |
float |
when the annotation overlaps with other, try increasing the repel_force to increase the padding between annotations. | 0.01 |
anno_alias |
dict |
snpid:text dictionary for customized annotation | None |
Repel force
Skip variants with -log10P<3 and annotate the lead variants with chr:pos
Skip variants with -log10P<3 and annotate the lead variants with GENENAME
Skip variants with -log10P<3 and annotate the variants in anno_set
Skip variants with -log10P<3 and annotate the variants in anno_set
with alias in anno_alias
Annotation style
GWASLab now support 3 types of annotation styles:
expand
right
tight
anno_style="expand"
anno_style="right"
anno_style="tight"
Adjust arm positions
Option | DataType | Description | Default |
---|---|---|---|
anno_d |
dict |
key is the number of arm starting form 0, value is the direction you want the arm to shift towards . For example, anno_d = {4:"r"} means shift the 4th arm to the right |
None |
arm_offset |
float |
distance in points | 500 |
arm_scale |
float |
factors to adjust the height for all arms | 1.0 |
arm_scale_d |
dict |
factors to adjust the height for specific arms. key is the number of arm startinf form 0, value is the factor which will be multiplied to arm height. | None |
Adjust the direction the first to left and the thrd to right
Adjust the length of arm for each variant
Highlight loci
Highlight specified loci (color all variants in a region by specifying variants and the length of flanking regions).
Highlighting Option | DataType | Description | Default |
---|---|---|---|
highlight |
list |
a list of SNPID or rsID; these loci (all variants in the specified variants positions +/- highlight_windowkb ) will be highlighted in pinpoint_color |
True |
highlight_windowkb |
int |
Specify the span of highlighted region in kbp | 500 |
highlight_color |
list |
Color for highlighting loci | "#CB132D" |
Pinpoint variants
Pinpoint certain variants in the Manhattan plot.
Pinpointing Option | DataType | Description | Default |
---|---|---|---|
pinpoint |
list |
a list of SNPID or rsID; these variants will be highlighted in pinpoint_color |
True |
pinpoint_color |
list |
color for pinpointing variants | "red" |
Highlight loci and pinpoint variants
Lines
Line Option | DataType | Description | Default |
---|---|---|---|
sig_line |
boolean |
If True, plot the significant threshold line | True |
sig_level |
float |
The significance threshold | 5e-8 |
sig_level_lead |
float |
The significance threshold for extracting lead variants to annotate | 5e-8 |
sig_line_color |
string |
If True, plot the significant threshold line | True |
suggestive_sig_line |
boolean |
If True, plot the suggestive threshold line | True |
suggestive_sig_level |
float |
The suggestive threshold | 5e-6 |
suggestive_sig_line_color |
string |
Suggestive level line color | "grey" |
additional_line |
list |
list of P values used to plot additional lines | None |
additional_line_color |
list |
list of colors for the additional lines | None |
cut_line_color |
string |
If True, plot the significant threshold line | "#ebebeb" |
Plot lines
mysumstats.plot_mqq(skip=3,
build="19",
anno="GENENAME",
windowsizekb=1000000,
cut=20,
cut_line_color="purple",
sig_level=5e-8,
sig_level_lead=1e-6,
sig_line_color="grey",
suggestive_sig_line = True,
suggestive_sig_level = 1e-6,
suggestive_sig_line_color="blue",
additional_line=[1e-40,1e-60],
additional_line_color=["yellow","green"])
MAF-stratified QQ plot
QQ plot Option | DataType | Description | Default |
---|---|---|---|
stratified |
boolean |
if True, plot MAF straitified QQ plot. Require EAF in sumstats. | False |
maf_bins |
list |
MAF bins for straitification. | [(0, 0.01), (0.01, 0.05), (0.05, 0.25),(0.25,0.5)] |
maf_bin_colors |
list |
colors used for each MAF bin. | ["#f0ad4e","#5cb85c", "#5bc0de","#000042"] |
MAF-stratified Q-Q plot
Colors and Fontsizes
mysumstats.plot_mqq(
colors=["#597FBD","#74BAD3"],
cut_line_color="#ebebeb",
sig_line_color="grey",
highlight_color="#CB132D",
pinpoint_color ="red",
maf_bin_colors = ["#f0ad4e","#5cb85c", "#5bc0de","#000042"],
fontsize = 10,
anno_fontsize = 10,
title_fontsize = 13,
marker_size=(5,25)
)
Color-related options
Color Option | DataType | Description | Default |
---|---|---|---|
colors |
list |
a list of colors for chromsomes in the Manhattan plot; it will be used repetitively. | ["#597FBD","#74BAD3"] |
cut_line_color |
string |
color for the cut line. | "#EBEBEB" |
sig_line_color |
string |
color for significance threshold line. | "grey" |
highlight_color |
string |
color for highlighting loci | "#CB132D" |
pinpoint_color |
string |
color for pinpointing variants | "red" |
maf_bin_colors |
list |
a list of colors for maf-stratified Q-Q plot. | ["#f0ad4e","#5cb85c", "#5bc0de","#000042"] |
Font-related options
Font Option | DataType | Description | Default |
---|---|---|---|
fontsize |
list |
fontsize for ticklabels. | 9 |
title_fontsize |
13 |
fontsize for title. | 13 |
anno_fontsize |
10 |
fontsize for annotation. | 9 |
font_family |
string |
font family | "Arial" |
Example
Titles
Title Option | DataType | Description | Default |
---|---|---|---|
title |
string |
title for the figure. | `` |
mtitle |
string |
title for the Manhattan plot | `` |
qtitle |
string |
title for the Q-Q plot | `` |
title_pad |
float |
padding for title | 1.08 |
Figure settings
Figure Option | DataType | Description | Default |
---|---|---|---|
figargs |
dict |
key-values pairs that are passed to matplotlib plt.subplots() |
{"figsize":(15,5),"dpi":200} |
Commonly used ones:
figsize
: figure sizedpi
: dots per inch. For pulications, dpi>=300 is on of the common criteria.
Saving plots
Two options for saving plots in .plot_mqq
Saving Option | DataType | Description | Default |
---|---|---|---|
save |
string or boolean |
If string , the plot will be saved to the specified path; If True , it will be saved to default path |
True |
save_args |
dict |
other parameters passed to matplotlib savefig function. |
{"dpi":300,"facecolor":"white"} |
Example
- save as png:
mysumstats.plot_mqq(save="mymqqplots.png",save_args={"dpi":300})
- save as PDF:
mysumstats.plot_mqq(save="mymqqplots.pdf",save_args={"dpi":300})