Statistics conversion
GWASLab can convert equvalent statistics, including:
| Target stats | Original stats | Implementation | 
|---|---|---|
| MLOG10P | P | sumstats["MLOG10P"] = -np.log10(sumstats["P"]) | 
| P | MLOG10P | sumstats["P"] = np.power(10,-sumstats["MLOG10P"]) | 
| P | Z | sumstats["P"] = ss.norm.sf(np.abs(sumstats["Z"])) * 2 | 
| P | CHISQ | sumstats["P"] = ss.chi2.sf(sumstats["CHISQ"], 1) | 
| OR OR_95L OR_95U | BETA SE | sumstats["OR"]   = np.exp(sumstats["BETA"]),sumstats["OR_95L"] = np.exp(sumstats["BETA"]-ss.norm.ppf(0.975)*sumstats["SE"]),sumstats["OR_95U"] = np.exp(sumstats["BETA"]+ss.norm.ppf(0.975)*sumstats["SE"]) | 
| BETA SE | OR OR_95L OR_95U | sumstats["BETA"]  = np.log(sumstats["OR"]),sumstats["SE"]=(np.log(sumstats["OR"]) - np.log(sumstats["OR_95L"]))/ss.norm.ppf(0.975),sumstats["SE"]=(np.log(sumstats["OR_95U"]) - np.log(sumstats["OR"]))/ss.norm.ppf(0.975) | 
| Z | BETA/SE | sumstats["Z"] = sumstats["BETA"]/sumstats["SE"] | 
| CHISQ | P | sumstats["CHISQ"] = ss.chi2.isf(sumstats["P"], 1) | 
| CHISQ | Z | sumstats["CHISQ"] = (sumstats["Z"])**2 | 
| MAF | EAF | sumstats["MAF"] =  sumstats["EAF"].apply(lambda x: min(x,1-x) if pd.notnull(x) else np.nan) | 
Extreme P values
For extreme P, extreme=True can be added to overcome the limitation of extreme P values (P<1e-308). MLOG10P will be calculated using the methods described here:
mysumstats.fill_data(to_fill=["MLOG10P"], extreme=True)
Z socres (or BETA and SE) will be used to calculate MLOG10P, two additional columns P_MANTISSA and P_EXPONENT will be added to present p values. 
Note
The conversion is implemented using scipy and numpy.
- ss : import scipy.stats as ss
- np : import numpy as np
See examples here.
fill_data()
Options
- to_fill: the columns to fill. ["OR","OR_95L","OR_95U","BETA","SE","P","MLOG10P","Z","CHISQ"]
- df: columns name for degree of freedom
- overwrite: if overwrite when the specified column existed
- only_sig: fill the data only for significant variants
Priority
- For P : using MLOG10P, Z, CHISQ
- For MLOG10P : using P, MLOG10P, Z, CHISQ
- For BETA/SE : using OR/OR_95L/OR_95U
- For OR/OR_95L/OR_95U : using BETA/SE
- For Z : using BETA/SE
- For CHISQ : using Z, P
Example
Example
# raw data
#SNPID  CHR POS EA  NEA EAF BETA    SE  P   STATUS
#1:725932_G_A   1   725932  G   A   0.9960  -0.0737 0.1394  0.5970  9999999
#1:725933_A_G   1   725933  G   A   0.0040  0.0737  0.1394  0.5973  9999999
#1:737801_T_C   1   737801  C   T   0.0051  0.0490  0.1231  0.6908  9999999
# let's fill "MLOG10P","Z","OR","OR_95L","OR_95U"
# gwaslab will automatically search for equivalent statistics
mysumstats.fill_data(to_fill=["MLOG10P","Z","OR","OR_95L","OR_95U"])
Wed Oct 19 10:13:30 2022 Start filling data using existing columns...
Wed Oct 19 10:13:30 2022  -Raw input columns:  ['SNPID', 'CHR', 'POS', 'EA', 'NEA', 'EAF', 'BETA', 'SE', 'P', 'STATUS']
Wed Oct 19 10:13:30 2022  -Overwrite mode:  False
Wed Oct 19 10:13:30 2022   - Skipping columns:  []
Wed Oct 19 10:13:30 2022 Filling columns:  ['MLOG10P', 'OR', 'OR_95L', 'OR_95U']
Wed Oct 19 10:13:30 2022   - Filling OR using BETA column...
Wed Oct 19 10:13:31 2022   - Filling OR_95L/OR_95U using BETA/SE columns...
Wed Oct 19 10:13:32 2022   - Filling MLOG10P using P column...
Wed Oct 19 10:13:38 2022 Finished filling data using existing columns.