Statistics conversion
GWASLab can convert equvalent statistics, including:
Target stats | Original stats | Implementation |
---|---|---|
MLOG10P | P | sumstats["MLOG10P"] = -np.log10(sumstats["P"]) |
P | MLOG10P | sumstats["P"] = np.power(10,-sumstats["MLOG10P"]) |
P | Z | sumstats["P"] = ss.norm.sf(np.abs(sumstats["Z"])) * 2 |
P | CHISQ | sumstats["P"] = ss.chi2.sf(sumstats["CHISQ"], 1) |
OR OR_95L OR_95U |
BETA SE |
sumstats["OR"] = np.exp(sumstats["BETA"]) , sumstats["OR_95L"] = np.exp(sumstats["BETA"]-ss.norm.ppf(0.975)*sumstats["SE"]) , sumstats["OR_95U"] = np.exp(sumstats["BETA"]+ss.norm.ppf(0.975)*sumstats["SE"]) |
BETA SE |
OR OR_95L OR_95U |
sumstats["BETA"] = np.log(sumstats["OR"]) , sumstats["SE"]=(np.log(sumstats["OR"]) - np.log(sumstats["OR_95L"]))/ss.norm.ppf(0.975) , sumstats["SE"]=(np.log(sumstats["OR_95U"]) - np.log(sumstats["OR"]))/ss.norm.ppf(0.975) |
Z | BETA/SE | sumstats["Z"] = sumstats["BETA"]/sumstats["SE"] |
CHISQ | P | sumstats["CHISQ"] = ss.chi2.isf(sumstats["P"], 1) |
CHISQ | Z | sumstats["CHISQ"] = (sumstats["Z"])**2 |
MAF | EAF | sumstats["MAF"] = sumstats["EAF"].apply(lambda x: min(x,1-x) if pd.notnull(x) else np.nan) |
Extreme P values
For extreme P, extreme=True
can be added to overcome the limitation of extreme P values (P<1e-308). MLOG10P will be calculated using the methods described here:
mysumstats.fill_data(to_fill=["MLOG10P"], extreme=True)
Z socres (or BETA and SE) will be used to calculate MLOG10P, two additional columns P_MANTISSA
and P_EXPONENT
will be added to present p values.
Note
The conversion is implemented using scipy and numpy.
- ss :
import scipy.stats as ss
- np :
import numpy as np
See examples here.
fill_data()
Options
to_fill
: the columns to fill. ["OR","OR_95L","OR_95U","BETA","SE","P","MLOG10P","Z","CHISQ"]df
: columns name for degree of freedomoverwrite
: if overwrite when the specified column existedonly_sig
: fill the data only for significant variants
Priority
- For P : using MLOG10P, Z, CHISQ
- For MLOG10P : using P, MLOG10P, Z, CHISQ
- For BETA/SE : using OR/OR_95L/OR_95U
- For OR/OR_95L/OR_95U : using BETA/SE
- For Z : using BETA/SE
- For CHISQ : using Z, P
Example
Example
# raw data
#SNPID CHR POS EA NEA EAF BETA SE P STATUS
#1:725932_G_A 1 725932 G A 0.9960 -0.0737 0.1394 0.5970 9999999
#1:725933_A_G 1 725933 G A 0.0040 0.0737 0.1394 0.5973 9999999
#1:737801_T_C 1 737801 C T 0.0051 0.0490 0.1231 0.6908 9999999
# let's fill "MLOG10P","Z","OR","OR_95L","OR_95U"
# gwaslab will automatically search for equivalent statistics
mysumstats.fill_data(to_fill=["MLOG10P","Z","OR","OR_95L","OR_95U"])
Wed Oct 19 10:13:30 2022 Start filling data using existing columns...
Wed Oct 19 10:13:30 2022 -Raw input columns: ['SNPID', 'CHR', 'POS', 'EA', 'NEA', 'EAF', 'BETA', 'SE', 'P', 'STATUS']
Wed Oct 19 10:13:30 2022 -Overwrite mode: False
Wed Oct 19 10:13:30 2022 - Skipping columns: []
Wed Oct 19 10:13:30 2022 Filling columns: ['MLOG10P', 'OR', 'OR_95L', 'OR_95U']
Wed Oct 19 10:13:30 2022 - Filling OR using BETA column...
Wed Oct 19 10:13:31 2022 - Filling OR_95L/OR_95U using BETA/SE columns...
Wed Oct 19 10:13:32 2022 - Filling MLOG10P using P column...
Wed Oct 19 10:13:38 2022 Finished filling data using existing columns.