Skip to content

Comparing effect sizes

Scatter plot : effect size comparison

Available since v3.4.17

gl.compare_effect() will plot effect size comparison plot using two sets of sumstats. Alleles will be aligned to effect alleles in sumstats1.

gl.compare_effect()

gl.compare_effect (path1,
                   path2,
                   cols_name_list_1=None, effect_cols_list_1=None,
                   cols_name_list_2=None, effect_cols_list_2=None,
                   eaf=[],
                   maf_level=None,
                   label=None,
                   snplist=None,
                   mode="beta",
                   anno=False,
                   anno_het=False,
                   anno_min=0,
                   anno_min1=0,
                   anno_min2=0,
                   anno_diff=0,
                   scaled=False,
                   scaled1=False,
                   scaled2=False,
                   wc_correction=False, 
                   null_beta=0,
                   is_q=False,
                   include_all=True,
                   q_level=0.05,
                   sig_level=5e-8,
                   drop=False,
                   wc_sig_level=5e-8,
                   # reg
                   reg_box=None,
                   is_reg=True,
                   fdr=False,
                   allele_match=False,
                   r_se=False,
                   is_45_helper_line=True,
                   legend_mode="full",
                   legend_title=r'$ P < 5 x 10^{-8}$ in:',
                   legend_title2=r'Heterogeneity test:',
                   legend_pos='upper left',
                   scatterargs=None,
                   plt_args=None,
                   xylabel_prefix="Per-allele effect size in ",
                   helper_line_args=None,
                   fontargs=None,
                   errargs=None,
                   legend_args=None,
                   sep=["\t","\t"],
                   log = Log(),
                   save=False,
                   save_args=None,
                   verbose=False)

Options

Path and column

  • path1 and path2 : the paths to the sumstats. Can also be gl.Sumstats Object or pd.DataFrame (from v3.4.17). If gl.Sumstats Objects are provided, there is no need to set cols_name_list and effect_cols_list.
  • cols_name_list_1 and cols_name_list_2 : list of column names for variants basic information
  • effect_cols_list_1 and effect_cols_list_2 : list of column names for effect size-related columns
  • mode : use beta or OR
  • cols_name_list_x and effect_cols_list_x examples:
    • [snpid,p,ea,nea] ,[effect,se]
    • [snpid,p,ea,nea,chr,pos],[effect,se]
    • [snpid,p,ea,nea,chr,pos],[OR,OR_l,OR_h]
Option Type Description Default
path1 string,gl.Sumstats, or pd.DataFrame path to the sumstats file, or gwaslab Sumstats object or pandas Dataframe None
path2 string,gl.Sumstats, or pd.DataFrame path to the sumstats file, or gwaslab Sumstats object or pandas Dataframe None
cols_name_list_1 list "[snpid,p,ea,nea]" or "[snpid,p,ea,nea,chr,pos]" None
cols_name_list_2 list "[snpid,p,ea,nea]" or "[snpid,p,ea,nea,chr,pos]" None
effect_cols_list_1 list "[effect,se]" or "[OR,OR_95l,OR_95h]" None
effect_cols_list_2 list "[effect,se]" or "[OR,OR_95l,OR_95h]" None
mode beta or OR plot beta or OR beta

Note

from v3.4.17, you need to specify the parameters using keywords instead of using them as positional arguments for cols_name_list_1 and cols_name_list_2, effect_cols_list_1 and effect_cols_list_1 .

Save figures

Option Type Description Default
save string path to the saved file ./Sumstats1_Sumstats2_effect_comparison_plot.png
save_args dict parametrs for plt.savefig() {"dpi":300,"facecolor":"white"}

SNP List

Option Type Description Default
snplist list optional, specify the variants you want to compare. If None, GWASLab will automatically extract lead variants from both sumstats. None

Filter by MAF

Option Type Description Default
eaf list optional, a list column names for effect allele frequency, in the order of [sumstats1_eaf, sumstats2_eaf]. It is required when you need to filter by maf using maf_level. None
maf_level float the maf filter for variants. Vairants with maf < maf_level will be removed from comparison. None

Label

Option Type Description Default
label list a list of labels for the legend , in the order of ["Sumstats_1","Sumstats_2","Both","None"] ["Sumstats_1","Sumstats_2","Both","None"]
sig_level float the significance level for auto-extracting lead variants. If snplist is provided, the auto-extraction will be skipped. 5e-8
legend_title string legend title '$ P < 5 x 10^{-8}$ in:'
legend_pos string legend position upper left
xylabel_prefix string - "Per-allele effect size in "

Annotation

Option Type Description Default
is_reg boolean if true, draw regression line. True
is_45_helper_line boolean if true, draw 45 degree line. True
anno boolean if true, annotate the variants with ID. False
anno_diff float threshold of effect size difference for annotation. 0
anno_min1 float threshold of sumstats1 minimum absolute effect size for annotation. 0
anno_min2 float threshold of sumstats2 minimum absolute effect size for annotation. 0

Heterogeneity test

Option Type Description Default
is_q boolean if true, apply the heterogeneity tests by Cochran's Q test. True
q_level float the significance threshold for Cochran's Q test (raw p value). 0.05
anno_het annotate only variants with Phet < q_level False

R SE

Option Type Description Default
r_se boolean If True, SE for r will be estimated using the jackknife method. (Note: available from v3.4.17) False
\[ s.e.(\hat{r}_{jack}) = \sqrt{ {{n-1}\over{n}} \sum_{i=1}^n(\hat{r_i} -\bar{r}_{jack} )^2 } \]

Examples

Use gl.Sumstats object

gl1 = gl.Sumstats("bbj_bmi_female.txt.gz",fmt="gwaslab",snpid="SNP",ea="REF",nea="ALT",sep="\t")
gl2 = gl.Sumstats("bbj_bmi_male.txt.gz",fmt="gwaslab",snpid="SNP",ea="REF",nea="ALT",sep="\t")

# auto extract lead SNPs and compare the effect sizes
a = gl.compare_effect(path1= gl1,
                      path2= gl2
)
image

Use pandas DataFrame

pd1 = pd.read_table("bbj_bmi_female.txt.gz",sep="\t")
pd2 = pd.read_table("bbj_bmi_male.txt.gz",sep="\t")

# cols_name_list should be SNPID, P, Effect Allele, Non-Effect allele, Chromosome and Position
# effect_cols_list should be BETA,SE

a = gl.compare_effect(path1 = pd1,
                    cols_name_list_1 = ["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_1= ["BETA","SE"],
                    path2 = pd2,
                    cols_name_list_2 = ["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_2= ["BETA","SE"]
                    )

Heterogeneity test

pd1 = pd.read_table("bbj_bmi_female.txt.gz",sep="\t")
pd2 = pd.read_table("bbj_bmi_male.txt.gz",sep="\t")

# cols_name_list should be SNPID, P, Effect Allele, Non-Effect allele, Chromosome and Position
# effect_cols_list should be BETA,SE
# is_q : if true, apply the heterogeneity tests by Cochran's Q test.
# q_level` : `float`, the significance threshold for Cochran's Q test (raw p value).

a = gl.compare_effect(path1 = pd1,
                      cols_name_list_1 = ["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_1= ["BETA","SE"],
                      path2 = pd2,
                      cols_name_list_2 = ["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_2= ["BETA","SE"],
                      is_q=True,
                      q_level=0.05,
                      legend_mode="full"
                      )
image

Annotation

a = gl.compare_effect(path1="bbj_bmi_female.txt.gz",
                  cols_name_list_1=["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_1=["BETA","SE"],
                  path2="bbj_bmi_male.txt.gz",
                  cols_name_list_2=["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_2=["BETA","SE"],
                  label=["Female","Male","Both","None"],
                  xylabel_prefix="Per-allele effect size for ",
                  r_se=True, 
                  is_q=True, 
                  anno=True, 
                  anno_het=True, # only annotate variants with significant heterogeneity
                  anno_diff=0.015,    # only annotate variants with a difference in effect size > 0.015
                  sig_level=5e-8,
                  legend_title=r'$ P < 5 x 10^{-8}$ in:',
                  verbose=True)
image

Male-specific and female-specific BMI Sumstats from JENGER

!wget -O bbj_bmi_male.txt.gz http://jenger.riken.jp/2analysisresult_qtl_download/
!wget -O bbj_bmi_female.txt.gz http://jenger.riken.jp/4analysisresult_qtl_download/

Headers of the files : # SNP    CHR POS REF ALT Frq Rsq BETA    SE  P

# GWASLab will automatically extract significant variants from both sumstats. 
a = gl.compare_effect(path1="bbj_bmi_female.txt.gz",
                  cols_name_list_1=["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_1=["BETA","SE"],
                  path2="bbj_bmi_male.txt.gz",
                  cols_name_list_2=["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_2=["BETA","SE"],
                  label=["Female","Male","Both","None"],
                  xylabel_prefix="Per-allele effect size for ",
                  anno=True,
                  anno_het=True,
                  anno_diff=0.015,
                  sig_level=5e-8,
                  legend_title=r'$ P < 5 x 10^{-8}$ in:',
                  verbose=True,
                  save = "myplot.png",
                  save_args= {"dpi":300,"facecolor":"white"}
)
image

Reference: Akiyama, M., Okada, Y., Kanai, M., Takahashi, A., Momozawa, Y., Ikeda, M., ... & Kamatani, Y. (2017). Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nature genetics, 49(10), 1458-1467.