Regional plots

Color issue
- gwaslab<=3.4.39 : the color assigned to each variant is actually the color for the lower LD r2 category. For example, variants with LD>0.8 will be colored with the color for 0.8>LD>0.6.
- gwaslab v3.4.40 : the color for region_ref_second was assigned based on region_ref LD.
- Solution: Update to new version (>=3.4.41) of gwaslab.
GWASLab provides functions for creating regional plots.
.plot_mqq(mode="r")
GWASLab regional plot function is based on plot_mqq(). Most options are largely the same as Manhattan plot.
Options
| Option | DataType | Description | Default |
|---|---|---|---|
mode |
r |
specify regional plot mode | - |
region |
tuple |
a three elements tuple (chr, start, end); for example, (7,156538803,157538803) | - |
vcf_path |
string |
path to LD reference in VCF format: if None, LD information will not be plotted. | None |
region_ref |
list |
the SNPID or rsID for reference variants; if None, lead variants will be selected; support up to 7 reference markers (since v3.4.47) | [None] |
region_grid |
boolean |
If True, plot the grid line | False |
region_grid_line |
dict |
parameters for the grid line | {"linewidth": 2,"linestyle":"--"} |
region_lead_grid |
boolean |
If True, plot a line to show the reference variants | True |
region_lead_grid_line |
dict |
parameters for the line to show the reference variants | {"alpha":0.5,"linewidth" : 2,"linestyle":"--","color":"#FF0000"} |
region_ld_threshold |
list |
LD r2 categories | [0.2,0.4,0.6,0.8] |
region_ld_colors |
list |
LD r2 categories colors for single reference marker | ["#E4E4E4","#020080","#86CEF9","#24FF02","#FDA400","#FF0000","#FF0000"] |
region_ld_colors_m |
list |
list of colors used for multiple reference markers (since v3.4.47) | ["#E51819","#367EB7","green","#F07818","#AD5691","yellow","purple"] |
region_marker_shapes |
list |
list of shapes used for multiple reference markers (since v3.4.47) | ['o', 's','^','D','*','P','X','h','8'] |
region_chromatin_files |
list |
list of paths of Roadmap 15_coreMarks_mnemonics.bed.gz files |
[] |
region_chromatin_labels |
list |
list of labels for region_chromatin_files | [] |
region_hspace |
float |
the space between the scatter plot and the gene track | 0.02 |
region_step |
int |
number of X axis ticks | 21 |
region_recombination |
boolean |
True |
|
tabix |
string |
path to tabix; if None, GWASLab will search in environmental path; Note: if tabix is available, the speed is much faster!!! | None |
taf |
list |
a five-element list; number of gene track lanes, offset for gene track, font_ratio, exon_ratio, text_offset | [4,0,0.95,1,1] |
build |
19 or 38 |
reference genome build; 99 for unknown |
99 |
Calculation of LD r2
The calculation is based on Rogers and Huff r implemented in scikit-alle. Variants in reference vcf file should be biallelic format. Unphased data is acceptable. AF information is not needed. Variant ID is not required. Missing genotype is allowed.
Examples
Example
See Regional plot
gl.plot_stacked_mqq()
Creates stacked Manhattan-QQ plots or regional plots for multiple GWAS datasets, allowing side-by-side comparison of multiple studies or traits.
Parameters
Required Parameters
| Parameter | DataType | Description | Default |
|---|---|---|---|
objects |
list |
List of gl.Sumstats objects or pandas DataFrames containing GWAS summary statistics |
Required |
Plot Mode Parameters
| Parameter | DataType | Description | Default |
|---|---|---|---|
mode |
str |
Plot mode: "r" for regional plots, "m" for Manhattan plots, "mqq" for Manhattan-QQ plots |
"r" |
pm |
list |
List of panel modes for each object: "m" for Manhattan, "pip" for PIP/credible sets |
None (auto-detected) |
region |
tuple |
For regional plots: three-element tuple (chr, start, end), e.g., (7, 156538803, 157538803) |
None |
vcfs |
list |
List of VCF file paths for LD reference. If single VCF provided, it will be used for all panels. For regional plots, must match number of objects or be length 1 | [] |
Layout Parameters
| Parameter | DataType | Description | Default |
|---|---|---|---|
titles |
list |
List of titles for each panel | None |
title_pos |
str or tuple |
Position of titles | None |
title_kwargs |
dict |
Keyword arguments for title styling | None |
subplot_height |
float |
Height of each subplot in inches | 4 |
region_hspace |
float |
Space between subplots | 0.07 |
mqqratio |
float |
Width ratio of Manhattan to QQ plot when mode="mqq" |
3 |
mqq_height |
float |
Height ratio for Manhattan plot panels | 1 |
cs_height |
float |
Height ratio for credible set (PIP) panels | 0.5 |
gene_track_height |
float |
Height ratio for gene track (regional plots only) | 0.5 |
region_chromatin_height |
float |
Height ratio for chromatin track (regional plots only) | 0.1 |
fig_kwargs |
dict |
Keyword arguments for matplotlib figure creation | None |
Regional Plot Specific Parameters
| Parameter | DataType | Description | Default |
|---|---|---|---|
region_chromatin_files |
list |
List of paths to Roadmap 15_coreMarks_mnemonics.bed.gz files |
[] |
region_chromatin_labels |
list |
List of labels for chromatin tracks | [] |
region_lead_grids |
list |
List of panel indices to show lead variant grid lines | None (all panels) |
region_ld_legends |
list |
List of panel indices to show LD legend | [0] |
gtf |
str |
Path to GTF file for gene annotation. Use "default" for built-in GTF |
None |
build |
str |
Reference genome build: "19", "38", or "99" for unknown |
"99" |
Styling Parameters
| Parameter | DataType | Description | Default |
|---|---|---|---|
fontsize |
float |
Font size for labels and text | 9 |
font_family |
str |
Font family name | "Arial" |
common_ylabel |
bool |
If True, use common y-axis label for all panels |
True |
Output Parameters
| Parameter | DataType | Description | Default |
|---|---|---|---|
save |
str |
Path to save the figure | None |
save_kwargs |
dict |
Keyword arguments for saving (dpi, bbox_inches, etc.) | None |
verbose |
bool |
Print progress messages | True |
Additional Parameters
All parameters from plot_mqq() can be passed via **mqq_kwargs to customize individual panels. These include:
highlight,anno_set,pinpointfor variant annotationcolors,scatter_kwargsfor stylingsig_line,suggestive_sig_linefor significance linesregion_ld_threshold,region_ld_colorsfor LD coloring- And many more (see Manhattan plot documentation)
Plot Modes
Regional Mode (mode="r")
Creates stacked regional plots with LD information and gene tracks. Each panel shows a regional plot for a specific genomic region. Requires:
regionparameter specifying the genomic regionvcfsparameter for LD reference (optional but recommended)
Manhattan Mode (mode="m")
Creates stacked Manhattan plots across the genome. All panels share the same x-axis (genomic position). Useful for comparing multiple traits or studies genome-wide.
Manhattan-QQ Mode (mode="mqq")
Creates stacked panels with Manhattan plot on the left and QQ plot on the right for each dataset. Useful for quality control and comparing multiple studies.
Panel Types
The function automatically detects panel types based on input data:
- Manhattan panels (
pm="m"): For DataFrames withPorMLOG10Pcolumns - Credible set panels (
pm="pip"): For DataFrames withPIPcolumn (e.g., from fine-mapping)
Examples
Stacked regional plots for multiple studies:
import gwaslab as gl
# Load multiple sumstats
sumstats1 = gl.Sumstats("study1.txt.gz")
sumstats2 = gl.Sumstats("study2.txt.gz")
# Create stacked regional plot
gl.plot_stacked_mqq(
objects=[sumstats1, sumstats2],
mode="r",
region=(7, 156538803, 157538803),
vcfs=["ld_ref.vcf.gz"], # Single VCF used for all panels
titles=["Study 1", "Study 2"],
build="38"
)
Stacked Manhattan plots:
# Compare multiple traits genome-wide
gl.plot_stacked_mqq(
objects=[trait1_sumstats, trait2_sumstats, trait3_sumstats],
mode="m",
titles=["Trait 1", "Trait 2", "Trait 3"],
colors=["#1f77b4", "#ff7f0e", "#2ca02c"],
sig_line=True,
sig_level_plot=5e-8
)
Stacked Manhattan-QQ plots:
# Quality control comparison
gl.plot_stacked_mqq(
objects=[study1, study2],
mode="mqq",
titles=["Study 1", "Study 2"],
mqqratio=3
)
Regional plot with credible sets:
# Include fine-mapping results
mysumstats = gl.Sumstats("gwas.txt.gz")
finemap_results = pd.read_csv("finemap_results.txt") # Contains PIP column
gl.plot_stacked_mqq(
objects=[mysumstats, finemap_results],
mode="r",
region=(7, 156538803, 157538803),
vcfs=["ld_ref.vcf.gz"],
titles=["GWAS", "Fine-mapping"],
build="38"
)
Custom styling per panel:
# Different colors and styles for each panel
gl.plot_stacked_mqq(
objects=[sumstats1, sumstats2],
mode="m",
titles=["Panel 1", "Panel 2"],
colors=[["#1f77b4"], ["#ff7f0e"]], # Different colors per panel
scatter_kwargs=[{"s": 10}, {"s": 20}] # Different sizes per panel
)
Notes
- When
mode="r"(regional), the number of VCF files must match the number of objects, or a single VCF can be provided which will be used for all panels. - For credible set panels (
pm="pip"), VCF files are automatically set to"NA"as LD information is not applicable. - The function automatically detects panel types based on column names (
P/MLOG10Pfor Manhattan,PIPfor credible sets). - All parameters from
plot_mqq()can be passed to customize individual panels. Use lists to specify different values for each panel.
Notes and Troubleshooting
Gene track limitations:
The gene track only displays protein-coding genes from the reference GTF files. Non-coding genes, pseudogenes, and other gene types are excluded from the visualization.
Missing exons in gene track:
Very short exons may not appear in the gene track if they are too small to render at the current resolution. To display these exons, you can either increase the figure DPI (dots per inch) when saving the plot or reduce the length of the genomic region being plotted.
LD calculation errors:
LD (linkage disequilibrium) cannot be calculated when the reference variant in the VCF file is mono-allelic (i.e., has only one allele present in the reference panel). This will result in an error even if both variants are present in the reference VCF file. Ensure your reference VCF contains biallelic variants for accurate LD calculation.