Comparing effect sizes
Scatter plot : effect size comparison
Available since v3.4.17
gl.compare_effect()
will plot effect size comparison plot using two sets of sumstats. Alleles will be aligned to effect alleles in sumstats1.
gl.compare_effect()
gl.compare_effect (path1,
path2,
cols_name_list_1=None, effect_cols_list_1=None,
cols_name_list_2=None, effect_cols_list_2=None,
eaf=[],
maf_level=None,
label=None,
snplist=None,
mode="beta",
anno=False,
anno_het=False,
anno_min=0,
anno_min1=0,
anno_min2=0,
anno_diff=0,
scaled=False,
scaled1=False,
scaled2=False,
wc_correction=False,
null_beta=0,
is_q=False,
include_all=True,
q_level=0.05,
sig_level=5e-8,
drop=False,
wc_sig_level=5e-8,
# reg
reg_box=None,
is_reg=True,
fdr=False,
allele_match=False,
r_se=False,
is_45_helper_line=True,
legend_mode="full",
legend_title=r'$ P < 5 x 10^{-8}$ in:',
legend_title2=r'Heterogeneity test:',
legend_pos='upper left',
scatterargs=None,
plt_args=None,
xylabel_prefix="Per-allele effect size in ",
helper_line_args=None,
fontargs=None,
errargs=None,
legend_args=None,
sep=["\t","\t"],
log = Log(),
save=False,
save_args=None,
verbose=False)
Options
Path and column
path1
andpath2
: the paths to the sumstats. Can also begl.Sumstats
Object orpd.DataFrame
(from v3.4.17). Ifgl.Sumstats
Objects are provided, there is no need to set cols_name_list and effect_cols_list.cols_name_list_1
andcols_name_list_2
: list of column names for variants basic informationeffect_cols_list_1
andeffect_cols_list_2
: list of column names for effect size-related columnsmode
: use beta or ORcols_name_list_x
andeffect_cols_list_x
examples:[snpid,p,ea,nea]
,[effect,se]
[snpid,p,ea,nea,chr,pos]
,[effect,se]
[snpid,p,ea,nea,chr,pos]
,[OR,OR_l,OR_h]
Option | Type | Description | Default |
---|---|---|---|
path1 |
string ,gl.Sumstats , or pd.DataFrame |
path to the sumstats file, or gwaslab Sumstats object or pandas Dataframe | None |
path2 |
string ,gl.Sumstats , or pd.DataFrame |
path to the sumstats file, or gwaslab Sumstats object or pandas Dataframe | None |
cols_name_list_1 |
list |
"[snpid,p,ea,nea]" or "[snpid,p,ea,nea,chr,pos]" | None |
cols_name_list_2 |
list |
"[snpid,p,ea,nea]" or "[snpid,p,ea,nea,chr,pos]" | None |
effect_cols_list_1 |
list |
"[effect,se]" or "[OR,OR_95l,OR_95h]" | None |
effect_cols_list_2 |
list |
"[effect,se]" or "[OR,OR_95l,OR_95h]" | None |
mode |
beta or OR |
plot beta or OR | beta |
Note
from v3.4.17, you need to specify the parameters using keywords instead of using them as positional arguments for cols_name_list_1
and cols_name_list_2
, effect_cols_list_1
and effect_cols_list_1
.
Save figures
Option | Type | Description | Default |
---|---|---|---|
save |
string |
path to the saved file | ./Sumstats1_Sumstats2_effect_comparison_plot.png |
save_args |
dict |
parametrs for plt.savefig() | {"dpi":300,"facecolor":"white"} |
SNP List
Option | Type | Description | Default |
---|---|---|---|
snplist |
list |
optional, specify the variants you want to compare. If None, GWASLab will automatically extract lead variants from both sumstats. | None |
Filter by MAF
Option | Type | Description | Default |
---|---|---|---|
eaf |
list |
optional, a list column names for effect allele frequency, in the order of [sumstats1_eaf, sumstats2_eaf]. It is required when you need to filter by maf using maf_level . |
None |
maf_level |
float |
the maf filter for variants. Vairants with maf < maf_level will be removed from comparison. | None |
Label
Option | Type | Description | Default |
---|---|---|---|
label |
list |
a list of labels for the legend , in the order of ["Sumstats_1","Sumstats_2","Both","None"] | ["Sumstats_1","Sumstats_2","Both","None"] |
sig_level |
float |
the significance level for auto-extracting lead variants. If snplist is provided, the auto-extraction will be skipped. |
5e-8 |
legend_title |
string |
legend title | '$ P < 5 x 10^{-8}$ in:' |
legend_pos |
string |
legend position | upper left |
xylabel_prefix |
string |
- | "Per-allele effect size in " |
Annotation
Option | Type | Description | Default |
---|---|---|---|
is_reg |
boolean |
if true, draw regression line. | True |
is_45_helper_line |
boolean |
if true, draw 45 degree line. | True |
anno |
boolean |
if true, annotate the variants with ID. | False |
anno_diff |
float |
threshold of effect size difference for annotation. | 0 |
anno_min1 |
float |
threshold of sumstats1 minimum absolute effect size for annotation. | 0 |
anno_min2 |
float |
threshold of sumstats2 minimum absolute effect size for annotation. | 0 |
Heterogeneity test
Option | Type | Description | Default |
---|---|---|---|
is_q |
boolean |
if true, apply the heterogeneity tests by Cochran's Q test. | True |
q_level |
float |
the significance threshold for Cochran's Q test (raw p value). | 0.05 |
anno_het |
annotate only variants with Phet < q_level |
False |
R SE
Option | Type | Description | Default |
---|---|---|---|
r_se |
boolean |
If True, SE for r will be estimated using the jackknife method. (Note: available from v3.4.17) | False |
\[ s.e.(\hat{r}_{jack}) = \sqrt{ {{n-1}\over{n}} \sum_{i=1}^n(\hat{r_i} -\bar{r}_{jack} )^2 } \]
Examples
Use gl.Sumstats object
Use pandas DataFrame
pd1 = pd.read_table("bbj_bmi_female.txt.gz",sep="\t")
pd2 = pd.read_table("bbj_bmi_male.txt.gz",sep="\t")
# cols_name_list should be SNPID, P, Effect Allele, Non-Effect allele, Chromosome and Position
# effect_cols_list should be BETA,SE
a = gl.compare_effect(path1 = pd1,
cols_name_list_1 = ["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_1= ["BETA","SE"],
path2 = pd2,
cols_name_list_2 = ["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_2= ["BETA","SE"]
)
Heterogeneity test
pd1 = pd.read_table("bbj_bmi_female.txt.gz",sep="\t")
pd2 = pd.read_table("bbj_bmi_male.txt.gz",sep="\t")
# cols_name_list should be SNPID, P, Effect Allele, Non-Effect allele, Chromosome and Position
# effect_cols_list should be BETA,SE
# is_q : if true, apply the heterogeneity tests by Cochran's Q test.
# q_level` : `float`, the significance threshold for Cochran's Q test (raw p value).
a = gl.compare_effect(path1 = pd1,
cols_name_list_1 = ["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_1= ["BETA","SE"],
path2 = pd2,
cols_name_list_2 = ["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_2= ["BETA","SE"],
is_q=True,
q_level=0.05,
legend_mode="full"
)
Annotation
a = gl.compare_effect(path1="bbj_bmi_female.txt.gz",
cols_name_list_1=["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_1=["BETA","SE"],
path2="bbj_bmi_male.txt.gz",
cols_name_list_2=["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_2=["BETA","SE"],
label=["Female","Male","Both","None"],
xylabel_prefix="Per-allele effect size for ",
r_se=True,
is_q=True,
anno=True,
anno_het=True, # only annotate variants with significant heterogeneity
anno_diff=0.015, # only annotate variants with a difference in effect size > 0.015
sig_level=5e-8,
legend_title=r'$ P < 5 x 10^{-8}$ in:',
verbose=True)
Male-specific and female-specific BMI Sumstats from JENGER
!wget -O bbj_bmi_male.txt.gz http://jenger.riken.jp/2analysisresult_qtl_download/
!wget -O bbj_bmi_female.txt.gz http://jenger.riken.jp/4analysisresult_qtl_download/
Headers of the files : # SNP CHR POS REF ALT Frq Rsq BETA SE P
# GWASLab will automatically extract significant variants from both sumstats.
a = gl.compare_effect(path1="bbj_bmi_female.txt.gz",
cols_name_list_1=["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_1=["BETA","SE"],
path2="bbj_bmi_male.txt.gz",
cols_name_list_2=["SNP","P","REF","ALT","CHR","POS"],effect_cols_list_2=["BETA","SE"],
label=["Female","Male","Both","None"],
xylabel_prefix="Per-allele effect size for ",
anno=True,
anno_het=True,
anno_diff=0.015,
sig_level=5e-8,
legend_title=r'$ P < 5 x 10^{-8}$ in:',
verbose=True,
save = "myplot.png",
save_args= {"dpi":300,"facecolor":"white"}
)
Reference: Akiyama, M., Okada, Y., Kanai, M., Takahashi, A., Momozawa, Y., Ikeda, M., ... & Kamatani, Y. (2017). Genome-wide association study identifies 112 new loci for body mass index in the Japanese population. Nature genetics, 49(10), 1458-1467.