Input and output sumstats¶
In [1]:
Copied!
import gwaslab as gl
import gwaslab as gl
In [2]:
Copied!
gl.show_version()
gl.show_version()
2024/12/20 12:45:28 GWASLab v3.5.4 https://cloufield.github.io/gwaslab/ 2024/12/20 12:45:28 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com
Input¶
Loading data¶
In [3]:
Copied!
mysumstats = gl.Sumstats("../0_sample_data/t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
build="19",
n="N", verbose=False)
# select just 1000 variants for example
mysumstats.random_variants(n=1000, inplace=True, random_state=123,verbose=False)
# basic_check
mysumstats.basic_check(verbose=False)
mysumstats = gl.Sumstats("../0_sample_data/t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
direction="Dir",
build="19",
n="N", verbose=False)
# select just 1000 variants for example
mysumstats.random_variants(n=1000, inplace=True, random_state=123,verbose=False)
# basic_check
mysumstats.basic_check(verbose=False)
In [4]:
Copied!
mysumstats.data
mysumstats.data
Out[4]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | N | DIRECTION | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1:2005486_C_T | 1 | 2005486 | C | T | 0.9863 | -0.0969 | 0.0471 | 0.039820 | 191764 | +--- | 1960099 |
1 | 1:2247939_AAGG_A | 1 | 2247939 | AAGG | A | 0.9966 | 0.0330 | 0.1249 | 0.791900 | 191764 | ++-- | 1960399 |
2 | 1:3741853_G_A | 1 | 3741853 | G | A | 0.8849 | -0.0375 | 0.0142 | 0.008282 | 191764 | ---- | 1960099 |
3 | 1:5017526_G_A | 1 | 5017526 | G | A | 0.9822 | 0.0126 | 0.0373 | 0.736200 | 191764 | +-++ | 1960099 |
4 | 1:5843475_C_T | 1 | 5843475 | C | T | 0.9857 | -0.0011 | 0.0433 | 0.980100 | 191764 | --++ | 1960099 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
995 | X:139318378_A_G | 23 | 139318378 | G | A | 0.5252 | -0.0132 | 0.0071 | 0.061670 | 191764 | +--- | 1960099 |
996 | X:144540038_G_T | 23 | 144540038 | G | T | 0.9866 | 0.0379 | 0.0363 | 0.295800 | 191764 | ++++ | 1960099 |
997 | X:145299627_C_T | 23 | 145299627 | C | T | 0.9984 | -0.0663 | 0.1671 | 0.691400 | 191764 | ---+ | 1960099 |
998 | X:146441317_G_A | 23 | 146441317 | G | A | 0.7345 | 0.0037 | 0.0078 | 0.635100 | 191764 | ++-+ | 1960099 |
999 | X:152025052_A_G | 23 | 152025052 | G | A | 0.2417 | 0.0041 | 0.0082 | 0.622200 | 191764 | -++- | 1960099 |
1000 rows × 12 columns
Load and filtering by pattern¶
Load and filtering by snp pattern¶
In [5]:
Copied!
mysumstats_snp2 = gl.Sumstats("../0_sample_data/t2d_bbj.txt.gz",fmt="auto",
snpid_pat="^2:123",
build="19",
n="N", verbose=True)
mysumstats_snp2.data
mysumstats_snp2 = gl.Sumstats("../0_sample_data/t2d_bbj.txt.gz",fmt="auto",
snpid_pat="^2:123",
build="19",
n="N", verbose=True)
mysumstats_snp2.data
2024/12/20 12:45:56 GWASLab v3.5.4 https://cloufield.github.io/gwaslab/ 2024/12/20 12:45:56 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com 2024/12/20 12:45:56 Start to load format from formatbook.... 2024/12/20 12:45:56 -auto format meta info: 2024/12/20 12:45:56 - format_name : auto 2024/12/20 12:45:56 - format_separator : \t 2024/12/20 12:45:56 - format_na : #NA 2024/12/20 12:45:56 - format_version : 20230328 2024/12/20 12:45:56 - Auto-detection mode. Note: auto-detection assumes A1=EA; Alt=EA and Frq=EAF... 2024/12/20 12:45:56 - Header conversion source: https://github.com/Cloufield/formatbook/blob/main/formats/auto.json 2024/12/20 12:45:56 Start to initialize gl.Sumstats from file :../0_sample_data/t2d_bbj.txt.gz 2024/12/20 12:45:56 -Columns used to filter variants: SNP 2024/12/20 12:45:56 -Loading only variants with pattern : ^2:123 ... 2024/12/20 12:46:09 -Loaded 5722 variants with pattern : ^2:123 ... 2024/12/20 12:46:09 -Reading columns : BETA,POS,ALT,SNP,N,REF,CHR,Frq,P,SE 2024/12/20 12:46:09 -Renaming columns to : BETA,POS,EA,SNPID,N,NEA,CHR,EAF,P,SE 2024/12/20 12:46:09 -Current Dataframe shape : 5722 x 10 2024/12/20 12:46:09 -Initiating a status column: STATUS ... 2024/12/20 12:46:09 -Genomic coordinates are based on GRCh37/hg19... 2024/12/20 12:46:10 Start to reorder the columns...v3.5.4 2024/12/20 12:46:10 -Current Dataframe shape : 5722 x 11 ; Memory usage: 21.90 MB 2024/12/20 12:46:10 -Reordering columns to : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,STATUS 2024/12/20 12:46:10 Finished reordering the columns. 2024/12/20 12:46:10 -Column : SNPID CHR POS EA NEA EAF BETA SE P N STATUS 2024/12/20 12:46:10 -DType : object string int64 category category float64 float64 float64 float64 int64 category 2024/12/20 12:46:10 -Verified: T F T T T T T T T T T 2024/12/20 12:46:10 #WARNING! Columns with possibly incompatible dtypes: CHR 2024/12/20 12:46:10 -Current Dataframe memory usage: 21.90 MB 2024/12/20 12:46:10 Finished loading data successfully!
Out[5]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | N | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|---|
967172 | 2:12320_AAT_A | 2 | 12320 | AAT | A | 0.0572 | -0.0158 | 0.0294 | 0.5907 | 191764 | 1999999 |
967173 | 2:12371_G_C | 2 | 12371 | G | C | 0.0350 | -0.0190 | 0.0294 | 0.5179 | 191764 | 1999999 |
967529 | 2:123233_G_A | 2 | 123233 | G | A | 0.8736 | 0.0191 | 0.0135 | 0.1561 | 191764 | 1999999 |
967530 | 2:123332_C_G | 2 | 123332 | G | C | 0.1264 | -0.0191 | 0.0135 | 0.1561 | 191764 | 1999999 |
967531 | 2:123554_C_A | 2 | 123554 | C | A | 0.4748 | 0.0115 | 0.0088 | 0.1929 | 191764 | 1999999 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1490418 | 2:123998245_G_A | 2 | 123998245 | G | A | 0.1047 | -0.0101 | 0.0144 | 0.4834 | 191764 | 1999999 |
1490419 | 2:123998250_T_C | 2 | 123998250 | C | T | 0.8355 | 0.0099 | 0.0119 | 0.4062 | 191764 | 1999999 |
1490420 | 2:123998340_A_G | 2 | 123998340 | G | A | 0.9566 | -0.0082 | 0.0222 | 0.7132 | 191764 | 1999999 |
1490421 | 2:123999824_A_G | 2 | 123999824 | G | A | 0.8953 | 0.0103 | 0.0144 | 0.4760 | 191764 | 1999999 |
1490422 | 2:123999965_G_A | 2 | 123999965 | G | A | 0.0021 | -0.1152 | 0.1654 | 0.4861 | 191764 | 1999999 |
5722 rows × 11 columns
Load and filtering by chr pattern¶
In [6]:
Copied!
mysumstats_chr22 = gl.Sumstats("../0_sample_data/t2d_bbj.txt.gz",fmt="auto",
chrom_pat="^22",
build="19",
n="N", verbose=True)
mysumstats_chr22.data
mysumstats_chr22 = gl.Sumstats("../0_sample_data/t2d_bbj.txt.gz",fmt="auto",
chrom_pat="^22",
build="19",
n="N", verbose=True)
mysumstats_chr22.data
2024/12/20 12:46:10 GWASLab v3.5.4 https://cloufield.github.io/gwaslab/ 2024/12/20 12:46:10 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com 2024/12/20 12:46:10 Start to load format from formatbook.... 2024/12/20 12:46:10 -auto format meta info: 2024/12/20 12:46:10 - format_name : auto 2024/12/20 12:46:10 - format_separator : \t 2024/12/20 12:46:10 - format_na : #NA 2024/12/20 12:46:10 - format_version : 20230328 2024/12/20 12:46:10 - Auto-detection mode. Note: auto-detection assumes A1=EA; Alt=EA and Frq=EAF... 2024/12/20 12:46:10 - Header conversion source: https://github.com/Cloufield/formatbook/blob/main/formats/auto.json 2024/12/20 12:46:10 Start to initialize gl.Sumstats from file :../0_sample_data/t2d_bbj.txt.gz 2024/12/20 12:46:10 -Columns used to filter variants: CHR 2024/12/20 12:46:10 -Loading only variants on chromosome with pattern : ^22 ... 2024/12/20 12:46:22 -Loaded 157050 variants on chromosome with pattern :^22 ... 2024/12/20 12:46:22 -Reading columns : BETA,POS,ALT,SNP,N,REF,CHR,Frq,P,SE 2024/12/20 12:46:22 -Renaming columns to : BETA,POS,EA,SNPID,N,NEA,CHR,EAF,P,SE 2024/12/20 12:46:22 -Current Dataframe shape : 157050 x 10 2024/12/20 12:46:22 -Initiating a status column: STATUS ... 2024/12/20 12:46:22 -Genomic coordinates are based on GRCh37/hg19... 2024/12/20 12:46:23 Start to reorder the columns...v3.5.4 2024/12/20 12:46:23 -Current Dataframe shape : 157050 x 11 ; Memory usage: 33.55 MB 2024/12/20 12:46:23 -Reordering columns to : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,STATUS 2024/12/20 12:46:23 Finished reordering the columns. 2024/12/20 12:46:23 -Column : SNPID CHR POS EA NEA EAF BETA SE P N STATUS 2024/12/20 12:46:23 -DType : object string int64 category category float64 float64 float64 float64 int64 category 2024/12/20 12:46:23 -Verified: T F T T T T T T T T T 2024/12/20 12:46:23 #WARNING! Columns with possibly incompatible dtypes: CHR 2024/12/20 12:46:23 -Current Dataframe memory usage: 33.55 MB 2024/12/20 12:46:23 Finished loading data successfully!
Out[6]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | N | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|---|
12071920 | 22:16847963_A_G | 22 | 16847963 | G | A | 0.5903 | 0.0115 | 0.0163 | 0.4819 | 166718 | 1999999 |
12071921 | 22:16848015_C_G | 22 | 16848015 | G | C | 0.6942 | 0.0110 | 0.0155 | 0.4795 | 166718 | 1999999 |
12071922 | 22:16848470_A_G | 22 | 16848470 | G | A | 0.5918 | 0.0118 | 0.0156 | 0.4487 | 166718 | 1999999 |
12071923 | 22:16848520_A_T | 22 | 16848520 | T | A | 0.5918 | 0.0116 | 0.0155 | 0.4537 | 166718 | 1999999 |
12071924 | 22:16849105_A_G | 22 | 16849105 | G | A | 0.5919 | 0.0109 | 0.0154 | 0.4810 | 166718 | 1999999 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
12228965 | 22:51230048_T_C | 22 | 51230048 | C | T | 0.9988 | -0.0645 | 0.2269 | 0.7762 | 166718 | 1999999 |
12228966 | 22:51230086_A_T | 22 | 51230086 | T | A | 0.9906 | -0.0543 | 0.0766 | 0.4780 | 166718 | 1999999 |
12228967 | 22:51237069_T_C | 22 | 51237069 | C | T | 0.9883 | -0.0356 | 0.0720 | 0.6212 | 191764 | 1999999 |
12228968 | 22:51239678_G_T | 22 | 51239678 | G | T | 0.0592 | -0.0038 | 0.0331 | 0.9077 | 166718 | 1999999 |
12228969 | 22:51239752_G_A | 22 | 51239752 | G | A | 0.0013 | 0.2937 | 0.2318 | 0.2051 | 166718 | 1999999 |
157050 rows × 11 columns
In [ ]:
Copied!
Output¶
general output¶
In [7]:
Copied!
mysumstats.to_format("./mysumstats",fmt="gwaslab",xymt_number=True)
mysumstats.to_format("./mysumstats",fmt="gwaslab",xymt_number=True)
2024/12/20 12:46:23 Start to convert the output sumstats in: gwaslab format 2024/12/20 12:46:23 -Formatting statistics ... 2024/12/20 12:46:23 -Float statistics formats: 2024/12/20 12:46:23 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:23 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:23 -Start outputting sumstats in gwaslab format... 2024/12/20 12:46:23 -gwaslab format will be loaded... 2024/12/20 12:46:23 -gwaslab format meta info: 2024/12/20 12:46:23 - format_name : gwaslab 2024/12/20 12:46:23 - format_source : https://cloufield.github.io/gwaslab/ 2024/12/20 12:46:23 - format_version : 20231220_v4 2024/12/20 12:46:23 -Output path: ./mysumstats.gwaslab.tsv.gz 2024/12/20 12:46:23 -Output columns: SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS 2024/12/20 12:46:23 -Writing sumstats to: ./mysumstats.gwaslab.tsv.gz... 2024/12/20 12:46:23 -Fast to csv mode... 2024/12/20 12:46:23 -Saving log file to: ./mysumstats.gwaslab.log 2024/12/20 12:46:23 Finished outputting successfully!
In [8]:
Copied!
!zcat mysumstats.gwaslab.tsv.gz | head
!zcat mysumstats.gwaslab.tsv.gz | head
SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS 1:2005486_C_T 1 2005486 C T 0.9863 -0.0969 0.0471 3.9820e-02 191764 +--- 1960099 1:2247939_AAGG_A 1 2247939 AAGG A 0.9966 0.0330 0.1249 7.9190e-01 191764 ++-- 1960399 1:3741853_G_A 1 3741853 G A 0.8849 -0.0375 0.0142 8.2820e-03 191764 ---- 1960099 1:5017526_G_A 1 5017526 G A 0.9822 0.0126 0.0373 7.3620e-01 191764 +-++ 1960099 1:5843475_C_T 1 5843475 C T 0.9857 -0.0011 0.0433 9.8010e-01 191764 --++ 1960099 1:9405103_T_C 1 9405103 C T 0.0021 -0.0729 0.1516 6.3050e-01 191764 +--- 1960099 1:9443411_G_A 1 9443411 G A 0.9916 0.0362 0.0532 4.9690e-01 191764 +-++ 1960099 1:12866348_G_C 1 12866348 G C 0.9728 -0.0352 0.0431 4.1450e-01 191764 ---+ 1960099 1:14466316_A_G 1 14466316 G A 0.6942 -0.0042 0.0096 6.6360e-01 191764 --+- 1960099 gzip: stdout: Broken pipe
In [9]:
Copied!
!zcat mysumstats.gwaslab.tsv.gz | tail
!zcat mysumstats.gwaslab.tsv.gz | tail
X:121023171_A_ACTT 23 121023171 ACTT A 0.5117 0.0096 0.0071 1.7390e-01 191764 ++++ 1960399 X:134838698_C_CTA 23 134838698 C CTA 0.9355 0.0060 0.0145 6.8080e-01 191764 ++-+ 1960399 X:135939006_G_T 23 135939006 G T 0.2068 -0.0037 0.0085 6.6300e-01 191764 -+-+ 1960099 X:136020644_C_T 23 136020644 C T 0.8756 0.0103 0.0106 3.3370e-01 191764 +++- 1960099 X:138148816_C_T 23 138148816 C T 0.7842 0.0089 0.0088 3.1330e-01 191764 ++++ 1960099 X:139318378_A_G 23 139318378 G A 0.5252 -0.0132 0.0071 6.1670e-02 191764 +--- 1960099 X:144540038_G_T 23 144540038 G T 0.9866 0.0379 0.0363 2.9580e-01 191764 ++++ 1960099 X:145299627_C_T 23 145299627 C T 0.9984 -0.0663 0.1671 6.9140e-01 191764 ---+ 1960099 X:146441317_G_A 23 146441317 G A 0.7345 0.0037 0.0078 6.3510e-01 191764 ++-+ 1960099 X:152025052_A_G 23 152025052 G A 0.2417 0.0041 0.0082 6.2220e-01 191764 -++- 1960099
output each chromosome to a single file¶
In [10]:
Copied!
mysumstats.to_format("./mysumstats.@",fmt="gwaslab",xymt_number=True)
mysumstats.to_format("./mysumstats.@",fmt="gwaslab",xymt_number=True)
2024/12/20 12:46:23 Start to convert the output sumstats in: gwaslab format 2024/12/20 12:46:23 -Formatting statistics ... 2024/12/20 12:46:23 -Float statistics formats: 2024/12/20 12:46:23 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:23 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:23 -Start outputting sumstats in gwaslab format... 2024/12/20 12:46:23 -gwaslab format will be loaded... 2024/12/20 12:46:23 -gwaslab format meta info: 2024/12/20 12:46:23 - format_name : gwaslab 2024/12/20 12:46:23 - format_source : https://cloufield.github.io/gwaslab/ 2024/12/20 12:46:23 - format_version : 20231220_v4 2024/12/20 12:46:23 -Output path: ./mysumstats.@.gwaslab.tsv.gz 2024/12/20 12:46:23 -Output columns: SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS 2024/12/20 12:46:23 -Writing sumstats to: ./mysumstats.@.gwaslab.tsv.gz... 2024/12/20 12:46:23 -Fast to csv mode... 2024/12/20 12:46:23 -@ detected: writing each chromosome to a single file... 2024/12/20 12:46:23 -Chromosomes:['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23']... 2024/12/20 12:46:24 -Saving log file to: ./mysumstats.@.gwaslab.log 2024/12/20 12:46:24 Finished outputting successfully!
load sumstats for each chromosome¶
In [11]:
Copied!
mysumstats = gl.Sumstats("./mysumstats.@.gwaslab.tsv.gz",
fmt="gwaslab", verbose=True)
mysumstats = gl.Sumstats("./mysumstats.@.gwaslab.tsv.gz",
fmt="gwaslab", verbose=True)
2024/12/20 12:46:24 GWASLab v3.5.4 https://cloufield.github.io/gwaslab/ 2024/12/20 12:46:24 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com 2024/12/20 12:46:24 Start to load format from formatbook.... 2024/12/20 12:46:24 -gwaslab format meta info: 2024/12/20 12:46:24 - format_name : gwaslab 2024/12/20 12:46:24 - format_source : https://cloufield.github.io/gwaslab/ 2024/12/20 12:46:24 - format_version : 20231220_v4 2024/12/20 12:46:24 -Detected @ in path: load sumstats by each chromosome... 2024/12/20 12:46:24 -Chromosomes detected: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 2024/12/20 12:46:24 Start to initialize gl.Sumstats from files with pattern :./mysumstats.@.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.1.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.2.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.3.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.4.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.5.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.6.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.7.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.8.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.9.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.10.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.11.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.12.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.13.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.14.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.15.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.16.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.17.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.18.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.19.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.20.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.21.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.22.gwaslab.tsv.gz 2024/12/20 12:46:24 -Loading:./mysumstats.23.gwaslab.tsv.gz 2024/12/20 12:46:24 -Merging sumstats for chromosomes: 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 2024/12/20 12:46:24 -Reading columns : BETA,SNPID,POS,EAF,N,CHR,NEA,P,EA,STATUS,SE,DIRECTION 2024/12/20 12:46:24 -Renaming columns to : BETA,SNPID,POS,EAF,N,CHR,NEA,P,EA,STATUS,SE,DIRECTION 2024/12/20 12:46:24 -Current Dataframe shape : 1000 x 12 2024/12/20 12:46:24 -Initiating a status column: STATUS ... 2024/12/20 12:46:24 #WARNING! Version of genomic coordinates is unknown... 2024/12/20 12:46:24 Start to reorder the columns...v3.5.4 2024/12/20 12:46:24 -Current Dataframe shape : 1000 x 12 ; Memory usage: 21.54 MB 2024/12/20 12:46:24 -Reordering columns to : SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS 2024/12/20 12:46:24 Finished reordering the columns. 2024/12/20 12:46:24 -Column : SNPID CHR POS EA NEA EAF BETA SE P N DIRECTION STATUS 2024/12/20 12:46:24 -DType : object string int64 category category float64 float64 float64 float64 int64 object category 2024/12/20 12:46:24 -Verified: T F T T T T T T T T T T 2024/12/20 12:46:24 #WARNING! Columns with possibly incompatible dtypes: CHR 2024/12/20 12:46:24 -Current Dataframe memory usage: 21.54 MB 2024/12/20 12:46:24 Finished loading data successfully!
Check available formats¶
List the formats that GWASLab supports
In [12]:
Copied!
gl.list_formats()
gl.list_formats()
2024/12/20 12:46:24 Available formats: auto,bolt_lmm,cojo,fastgwa,gwascatalog,gwascatalog_hm,gwaslab,ldsc,metal,mrmega,mtag,pgscatalog,pgscatalog_hm,pheweb,plink,plink2,plink2_firth,plink2_linear,plink2_logistic,plink_assoc,plink_bim,plink_dosage,plink_fam,plink_fisher,plink_linear,plink_logistic,plink_psam,plink_pvar,popcorn,regenie,regenie_gene,saige,ssf,template,vcf
Check the contents of the specified format
In [13]:
Copied!
gl.check_format("ssf")
gl.check_format("ssf")
2024/12/20 12:46:24 Available formats:2024/12/20 12:46:24 meta_data2024/12/20 12:46:24 format_dict2024/12/20 12:46:24 2024/12/20 12:46:24 {'format_name': 'ssf', 'format_source': 'https://www.biorxiv.org/content/10.1101/2022.07.15.500230v1.full', 'format_cite_name': 'GWAS-SSF v0.1', 'format_separator': '\t', 'format_na': '#NA', 'format_comment': None, 'format_col_order': ['chromosome', 'base_pair_location', 'effect_allele', 'other_allele', 'beta', 'odds_ratio', 'hazard_ratio', 'standard_error', 'effect_allele_frequency', 'p_value', 'neg_log_10_p_value', 'ci_upper', 'ci_lower', 'rsid', 'variant_id', 'info', 'ref_allele', 'n'], 'format_version': 20230328}2024/12/20 12:46:24 {'variant_id': 'SNPID', 'rsid': 'rsID', 'chromosome': 'CHR', 'base_pair_location': 'POS', 'other_allele': 'NEA', 'effect_allele': 'EA', 'effect_allele_frequency': 'EAF', 'n': 'N', 'beta': 'BETA', 'standard_error': 'SE', 'p_value': 'P', 'neg_log_10_p_value': 'MLOG10P', 'info': 'INFO', 'odds_ratio': 'OR', 'hazard_ratio': 'HR', 'ci_lower': 'OR_95L', 'ci_upper': 'OR_95U'}
Formatting and saving¶
get ready for submission to gwas catalog (GWAS-ssf format)¶
fmt
: specify the output formatssfmeta
: if True, output the meta filemd5sum
: if True, create a file with the md5sum of the output sumstats
In [14]:
Copied!
mysumstats.to_format("./mysumstats", fmt="ssf", ssfmeta=True, md5sum=True)
mysumstats.to_format("./mysumstats", fmt="ssf", ssfmeta=True, md5sum=True)
2024/12/20 12:46:24 Start to convert the output sumstats in: ssf format 2024/12/20 12:46:24 -Formatting statistics ... 2024/12/20 12:46:24 -Float statistics formats: 2024/12/20 12:46:24 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:24 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:24 -Replacing SNPID separator from ":" to "_"... 2024/12/20 12:46:24 -Start outputting sumstats in ssf format... 2024/12/20 12:46:24 -ssf format will be loaded... 2024/12/20 12:46:24 -ssf format meta info: 2024/12/20 12:46:24 - format_name : ssf 2024/12/20 12:46:24 - format_source : https://www.biorxiv.org/content/10.1101/2022.07.15.500230v1.full 2024/12/20 12:46:24 - format_cite_name : GWAS-SSF v0.1 2024/12/20 12:46:24 - format_separator : \t 2024/12/20 12:46:24 - format_na : #NA 2024/12/20 12:46:24 - format_col_order : chromosome,base_pair_location,effect_allele,other_allele,beta,odds_ratio,hazard_ratio,standard_error,effect_allele_frequency,p_value,neg_log_10_p_value,ci_upper,ci_lower,rsid,variant_id,info,ref_allele,n 2024/12/20 12:46:24 - format_version : 20230328 2024/12/20 12:46:24 -gwaslab to ssf format dictionary: 2024/12/20 12:46:24 - gwaslab keys: SNPID,rsID,CHR,POS,NEA,EA,EAF,N,BETA,SE,P,MLOG10P,INFO,OR,HR,OR_95L,OR_95U 2024/12/20 12:46:24 - ssf values: variant_id,rsid,chromosome,base_pair_location,other_allele,effect_allele,effect_allele_frequency,n,beta,standard_error,p_value,neg_log_10_p_value,info,odds_ratio,hazard_ratio,ci_lower,ci_upper 2024/12/20 12:46:24 -Output path: ./mysumstats.ssf.tsv.gz 2024/12/20 12:46:24 -Output columns: chromosome,base_pair_location,effect_allele,other_allele,beta,standard_error,effect_allele_frequency,p_value,variant_id,n 2024/12/20 12:46:24 -Writing sumstats to: ./mysumstats.ssf.tsv.gz... 2024/12/20 12:46:24 -Fast to csv mode... 2024/12/20 12:46:24 -md5sum hashing for the file: ./mysumstats.ssf.tsv.gz 2024/12/20 12:46:24 -md5sum path: ./mysumstats.ssf.tsv.gz.md5sum 2024/12/20 12:46:24 -md5sum: 26a0577baadfee588c6bcf695295e483 2024/12/20 12:46:24 -Exporting SSF-style meta data to ./mysumstats.ssf.tsv.gz.ssf.tsv-meta.yaml 2024/12/20 12:46:24 -Saving log file to: ./mysumstats.ssf.log 2024/12/20 12:46:24 Finished outputting successfully!
In [15]:
Copied!
!zcat mysumstats.ssf.tsv.gz | head
!zcat mysumstats.ssf.tsv.gz | head
chromosome base_pair_location effect_allele other_allele beta standard_error effect_allele_frequency p_value variant_id n 1 2005486 C T -0.0969 0.0471 0.9863 3.9820e-02 1_2005486_C_T 191764 1 2247939 AAGG A 0.0330 0.1249 0.9966 7.9190e-01 1_2247939_AAGG_A 191764 1 3741853 G A -0.0375 0.0142 0.8849 8.2820e-03 1_3741853_G_A 191764 1 5017526 G A 0.0126 0.0373 0.9822 7.3620e-01 1_5017526_G_A 191764 1 5843475 C T -0.0011 0.0433 0.9857 9.8010e-01 1_5843475_C_T 191764 1 9405103 C T -0.0729 0.1516 0.0021 6.3050e-01 1_9405103_T_C 191764 1 9443411 G A 0.0362 0.0532 0.9916 4.9690e-01 1_9443411_G_A 191764 1 12866348 G C -0.0352 0.0431 0.9728 4.1450e-01 1_12866348_G_C 191764 1 14466316 G A -0.0042 0.0096 0.6942 6.6360e-01 1_14466316_A_G 191764 gzip: stdout: Broken pipe
In [16]:
Copied!
!head mysumstats.ssf.tsv.gz.md5sum
!head mysumstats.ssf.tsv.gz.md5sum
26a0577baadfee588c6bcf695295e483
In [17]:
Copied!
!head ./mysumstats.ssf.tsv-meta.ymal
!head ./mysumstats.ssf.tsv-meta.ymal
head: cannot open './mysumstats.ssf.tsv-meta.ymal' for reading: No such file or directory
ldsc default format¶
hapmap3
: if True, only output hapmap3 SNPsexclude_hla
: if True, exclude variants in HLA region from output
In [18]:
Copied!
mysumstats.to_format("./mysumstats",fmt="ldsc",hapmap3=True,exclude_hla=True,build="19")
mysumstats.to_format("./mysumstats",fmt="ldsc",hapmap3=True,exclude_hla=True,build="19")
2024/12/20 12:46:25 Start to convert the output sumstats in: ldsc format 2024/12/20 12:46:25 -Excluded 3 variants in HLA region (chr6: 25000000-34000000 )... 2024/12/20 12:46:25 Start to extract HapMap3 SNPs...v3.5.4 2024/12/20 12:46:25 -Current Dataframe shape : 997 x 12 ; Memory usage: 21.55 MB 2024/12/20 12:46:25 -Loading Hapmap3 variants from built-in datasets... 2024/12/20 12:46:25 -Since rsID not in sumstats, CHR:POS( build 19) will be used for matching... 2024/12/20 12:46:26 -Checking if alleles are same... 2024/12/20 12:46:26 -Variants with macthed alleles: 81 2024/12/20 12:46:26 -Raw input contains 81 Hapmap3 variants based on CHR:POS... 2024/12/20 12:46:26 Finished extracting HapMap3 SNPs. 2024/12/20 12:46:26 -Extract 81 variants in Hapmap3 datasets for build 19. 2024/12/20 12:46:26 -Formatting statistics ... 2024/12/20 12:46:26 -Float statistics formats: 2024/12/20 12:46:26 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:26 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:26 -Start outputting sumstats in ldsc format... 2024/12/20 12:46:26 -ldsc format will be loaded... 2024/12/20 12:46:26 -ldsc format meta info: 2024/12/20 12:46:26 - format_name : ldsc 2024/12/20 12:46:26 - format_source : https://github.com/bulik/ldsc/wiki/Summary-Statistics-File-Format 2024/12/20 12:46:26 - format_source2 : https://github.com/bulik/ldsc/blob/master/munge_sumstats.py 2024/12/20 12:46:26 - format_version : 20150306 2024/12/20 12:46:26 -gwaslab to ldsc format dictionary: 2024/12/20 12:46:26 - gwaslab keys: rsID,NEA,EA,EAF,N,BETA,P,Z,INFO,OR,CHR,POS 2024/12/20 12:46:26 - ldsc values: SNP,A2,A1,Frq,N,Beta,P,Z,INFO,OR,CHR,POS 2024/12/20 12:46:26 -Output path: ./mysumstats.hapmap3.noMHC.ldsc.tsv.gz 2024/12/20 12:46:26 -Output columns: CHR,POS,A1,A2,Frq,Beta,P,N,SNP 2024/12/20 12:46:26 -Writing sumstats to: ./mysumstats.hapmap3.noMHC.ldsc.tsv.gz... 2024/12/20 12:46:26 -Fast to csv mode... 2024/12/20 12:46:26 -Saving log file to: ./mysumstats.hapmap3.noMHC.ldsc.log 2024/12/20 12:46:26 Finished outputting successfully!
In [19]:
Copied!
!zcat ./mysumstats.hapmap3.noMHC.ldsc.tsv.gz | head
!zcat ./mysumstats.hapmap3.noMHC.ldsc.tsv.gz | head
CHR POS A1 A2 Frq Beta P N SNP 1 14900419 G A 0.3952 0.0144 1.3750e-01 191764 rs6703840 1 19593199 C T 0.1323 -0.0127 3.2570e-01 191764 rs7527253 1 35282297 G A 0.5434 0.0041 6.4190e-01 191764 rs1407135 1 66001402 C T 0.2103 -0.0148 1.7720e-01 191764 rs1171261 1 83510491 G A 0.0025 0.0378 6.9800e-01 191764 rs2022427 1 166110693 C T 0.8627 0.0286 2.5250e-02 191764 rs4656480 1 175886511 G A 0.1828 -0.0141 2.2480e-01 191764 rs6656281 1 181612041 C T 0.9603 0.0135 5.5050e-01 191764 rs199955 1 196329362 C T 0.0301 0.0300 2.5060e-01 191764 rs11801881
vcf¶
bgzip
: if True, bgzip the output vcf/bedtabix
: if True, index the bgzipped file with tabix
In [20]:
Copied!
mysumstats.to_format("./mysumstats",fmt="vcf",bgzip=True,tabix=True,build="19")
mysumstats.to_format("./mysumstats",fmt="vcf",bgzip=True,tabix=True,build="19")
2024/12/20 12:46:26 Start to convert the output sumstats in: vcf format 2024/12/20 12:46:26 -Formatting statistics ... 2024/12/20 12:46:26 -Float statistics formats: 2024/12/20 12:46:26 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:26 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:26 -Start outputting sumstats in vcf format... 2024/12/20 12:46:26 -vcf format will be loaded... 2024/12/20 12:46:26 -vcf format meta info: 2024/12/20 12:46:26 - format_name : vcf 2024/12/20 12:46:26 - format_source : https://github.com/MRCIEU/gwas-vcf-specification/tree/1.0.0 2024/12/20 12:46:26 - format_version : 20220923 2024/12/20 12:46:26 - format_citation : Lyon, M.S., Andrews, S.J., Elsworth, B. et al. The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol 22, 32 (2021). https://doi.org/10.1186/s13059-020-02248-0 2024/12/20 12:46:26 - format_fixed : #CHROM,POS,ID,REF,ALT,QUAL,FILTER,INFO,FORMAT 2024/12/20 12:46:26 - format_format : ID,SS,ES,SE,LP,SI,EZ 2024/12/20 12:46:26 -gwaslab to vcf format dictionary: 2024/12/20 12:46:26 - gwaslab keys: rsID,CHR,POS,NEA,EA,N,EAF,BETA,SE,MLOG10P,INFO,Z 2024/12/20 12:46:26 - vcf values: ID,#CHROM,POS,REF,ALT,SS,AF,ES,SE,LP,SI,EZ 2024/12/20 12:46:26 -Creating VCF file header... 2024/12/20 12:46:26 -VCF header contig build:19 2024/12/20 12:46:27 -ID:Study_1 2024/12/20 12:46:27 -StudyType:Unknown 2024/12/20 12:46:27 -TotalVariants:1000 2024/12/20 12:46:27 -HarmonisedVariants:0 2024/12/20 12:46:27 -VariantsNotHarmonised:1000 2024/12/20 12:46:27 -SwitchedAlleles:0 2024/12/20 12:46:27 -Writing sumstats to: ./mysumstats.vcf... 2024/12/20 12:46:27 -bgzip compressing : ./mysumstats.vcf.gz... 2024/12/20 12:46:27 -tabix indexing : : ./mysumstats.vcf.gz.tbi... 2024/12/20 12:46:27 -Saving log file to: ./mysumstats.vcf.log 2024/12/20 12:46:27 Finished outputting successfully!
parquet¶
In [21]:
Copied!
mysumstats.to_format("./mysumstats",fmt="gwaslab",tab_fmt="parquet",to_tabular_kwargs={"partition_cols":["CHR"]})
mysumstats.to_format("./mysumstats",fmt="gwaslab",tab_fmt="parquet",to_tabular_kwargs={"partition_cols":["CHR"]})
2024/12/20 12:46:27 Start to convert the output sumstats in: gwaslab format 2024/12/20 12:46:27 -Formatting statistics ... 2024/12/20 12:46:27 -Float statistics formats: 2024/12/20 12:46:27 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:27 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:27 -Start outputting sumstats in gwaslab format... 2024/12/20 12:46:27 -gwaslab format will be loaded... 2024/12/20 12:46:27 -gwaslab format meta info: 2024/12/20 12:46:27 - format_name : gwaslab 2024/12/20 12:46:27 - format_source : https://cloufield.github.io/gwaslab/ 2024/12/20 12:46:27 - format_version : 20231220_v4 2024/12/20 12:46:27 -Output path: ./mysumstats.gwaslab.parquet 2024/12/20 12:46:27 -Output columns: SNPID,CHR,POS,EA,NEA,EAF,BETA,SE,P,N,DIRECTION,STATUS 2024/12/20 12:46:27 -Writing sumstats to: ./mysumstats.gwaslab.parquet... 2024/12/20 12:46:27 -Saving log file to: ./mysumstats.gwaslab.log 2024/12/20 12:46:27 Finished outputting successfully!
For annotation¶
convert to bed format¶
In [22]:
Copied!
mysumstats.to_format("./mysumstats",fmt="bed")
mysumstats.to_format("./mysumstats",fmt="bed")
2024/12/20 12:46:27 Start to convert the output sumstats in: bed format 2024/12/20 12:46:27 -Formatting statistics ... 2024/12/20 12:46:27 -Float statistics formats: 2024/12/20 12:46:27 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:27 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:27 -Start outputting sumstats in bed format... 2024/12/20 12:46:27 -Number of SNPs : 920 2024/12/20 12:46:27 -Number of Insertions : 52 2024/12/20 12:46:27 -Number of Deletions : 28 2024/12/20 12:46:27 -formatting to 0-based bed-like file... 2024/12/20 12:46:27 -format description: https://genome.ucsc.edu/FAQ/FAQformat.html#format1 2024/12/20 12:46:27 -Adjusting positions in format-specific manner.. 2024/12/20 12:46:27 -Output columns: CHR,START,END,NEA/EA,STRAND,SNPID 2024/12/20 12:46:27 -Writing sumstats to: ./mysumstats.bed... 2024/12/20 12:46:27 -Saving log file to: ./mysumstats.bed.log 2024/12/20 12:46:27 Finished outputting successfully!
In [23]:
Copied!
!cat mysumstats.bed | head
!cat mysumstats.bed | head
1 2005485 2005486 T/C + 1:2005486_C_T 1 2247939 2247939 -/AGG + 1:2247939_AAGG_A 1 3741852 3741853 A/G + 1:3741853_G_A 1 5017525 5017526 A/G + 1:5017526_G_A 1 5843474 5843475 T/C + 1:5843475_C_T 1 9405102 9405103 T/C + 1:9405103_T_C 1 9443410 9443411 A/G + 1:9443411_G_A 1 12866347 12866348 C/G + 1:12866348_G_C 1 14466315 14466316 A/G + 1:14466316_A_G 1 14900418 14900419 A/G + 1:14900419_A_G
convert to vep default format¶
In [24]:
Copied!
mysumstats.to_format("./mysumstats",fmt="vep")
mysumstats.to_format("./mysumstats",fmt="vep")
2024/12/20 12:46:28 Start to convert the output sumstats in: vep format 2024/12/20 12:46:28 -Formatting statistics ... 2024/12/20 12:46:28 -Float statistics formats: 2024/12/20 12:46:28 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:28 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:28 -Start outputting sumstats in vep format... 2024/12/20 12:46:28 -Number of SNPs : 920 2024/12/20 12:46:28 -Number of Insertions : 52 2024/12/20 12:46:28 -Number of Deletions : 28 2024/12/20 12:46:28 -formatting to 1-based bed-like file (for vep)... 2024/12/20 12:46:28 -format description: http://asia.ensembl.org/info/docs/tools/vep/vep_formats.html 2024/12/20 12:46:28 -Adjusting positions in format-specific manner.. 2024/12/20 12:46:28 -Output columns: CHR,START,END,NEA/EA,STRAND,SNPID 2024/12/20 12:46:28 -Writing sumstats to: ./mysumstats.vep... 2024/12/20 12:46:28 -Saving log file to: ./mysumstats.vep.log 2024/12/20 12:46:28 Finished outputting successfully!
In [25]:
Copied!
!cat mysumstats.vep | head
!cat mysumstats.vep | head
1 2005486 2005486 T/C + 1:2005486_C_T 1 2247940 2247939 -/AGG + 1:2247939_AAGG_A 1 3741853 3741853 A/G + 1:3741853_G_A 1 5017526 5017526 A/G + 1:5017526_G_A 1 5843475 5843475 T/C + 1:5843475_C_T 1 9405103 9405103 T/C + 1:9405103_T_C 1 9443411 9443411 A/G + 1:9443411_G_A 1 12866348 12866348 C/G + 1:12866348_G_C 1 14466316 14466316 A/G + 1:14466316_A_G 1 14900419 14900419 A/G + 1:14900419_A_G
convert to annovar default input format¶
In [26]:
Copied!
mysumstats.to_format("./mysumstats",fmt="annovar")
mysumstats.to_format("./mysumstats",fmt="annovar")
2024/12/20 12:46:28 Start to convert the output sumstats in: annovar format 2024/12/20 12:46:28 -Formatting statistics ... 2024/12/20 12:46:28 -Float statistics formats: 2024/12/20 12:46:28 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:28 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:28 -Start outputting sumstats in annovar format... 2024/12/20 12:46:28 -Number of SNPs : 920 2024/12/20 12:46:28 -Number of Insertions : 52 2024/12/20 12:46:28 -Number of Deletions : 28 2024/12/20 12:46:28 -formatting to 1-based bed-like file... 2024/12/20 12:46:28 -format description: https://annovar.openbioinformatics.org/en/latest/user-guide/input/ 2024/12/20 12:46:28 -Adjusting positions in format-specific manner.. 2024/12/20 12:46:28 -Output columns: CHR,START,END,NEA_out,EA_out,SNPID 2024/12/20 12:46:28 -Writing sumstats to: ./mysumstats.annovar... 2024/12/20 12:46:28 -Saving log file to: ./mysumstats.annovar.log 2024/12/20 12:46:28 Finished outputting successfully!
In [27]:
Copied!
!cat mysumstats.annovar | head
!cat mysumstats.annovar | head
1 2005486 2005486 T C 1:2005486_C_T 1 2247940 2247940 - AGG 1:2247939_AAGG_A 1 3741853 3741853 A G 1:3741853_G_A 1 5017526 5017526 A G 1:5017526_G_A 1 5843475 5843475 T C 1:5843475_C_T 1 9405103 9405103 T C 1:9405103_T_C 1 9443411 9443411 A G 1:9443411_G_A 1 12866348 12866348 C G 1:12866348_G_C 1 14466316 14466316 A G 1:14466316_A_G 1 14900419 14900419 A G 1:14900419_A_G
Filter and then output¶
In [28]:
Copied!
mysumstats.filter_value("EAF >0.05 and EAF < 0.95").to_format("./mysumstats_maf005", fmt="ssf", ssfmeta=True, md5sum=True)
mysumstats.filter_value("EAF >0.05 and EAF < 0.95").to_format("./mysumstats_maf005", fmt="ssf", ssfmeta=True, md5sum=True)
2024/12/20 12:46:28 Start filtering values by condition: EAF >0.05 and EAF < 0.95 2024/12/20 12:46:28 -Removing 483 variants not meeting the conditions: EAF >0.05 and EAF < 0.95 2024/12/20 12:46:28 Finished filtering values. 2024/12/20 12:46:28 Start to convert the output sumstats in: ssf format 2024/12/20 12:46:28 -Formatting statistics ... 2024/12/20 12:46:28 -Float statistics formats: 2024/12/20 12:46:28 - Columns : ['EAF', 'BETA', 'SE', 'P'] 2024/12/20 12:46:28 - Output formats: ['{:.4g}', '{:.4f}', '{:.4f}', '{:.4e}'] 2024/12/20 12:46:28 -Replacing SNPID separator from ":" to "_"... 2024/12/20 12:46:28 -Start outputting sumstats in ssf format... 2024/12/20 12:46:28 -ssf format will be loaded... 2024/12/20 12:46:28 -ssf format meta info: 2024/12/20 12:46:28 - format_name : ssf 2024/12/20 12:46:28 - format_source : https://www.biorxiv.org/content/10.1101/2022.07.15.500230v1.full 2024/12/20 12:46:28 - format_cite_name : GWAS-SSF v0.1 2024/12/20 12:46:28 - format_separator : \t 2024/12/20 12:46:28 - format_na : #NA 2024/12/20 12:46:28 - format_col_order : chromosome,base_pair_location,effect_allele,other_allele,beta,odds_ratio,hazard_ratio,standard_error,effect_allele_frequency,p_value,neg_log_10_p_value,ci_upper,ci_lower,rsid,variant_id,info,ref_allele,n 2024/12/20 12:46:28 - format_version : 20230328 2024/12/20 12:46:28 -gwaslab to ssf format dictionary: 2024/12/20 12:46:28 - gwaslab keys: SNPID,rsID,CHR,POS,NEA,EA,EAF,N,BETA,SE,P,MLOG10P,INFO,OR,HR,OR_95L,OR_95U 2024/12/20 12:46:28 - ssf values: variant_id,rsid,chromosome,base_pair_location,other_allele,effect_allele,effect_allele_frequency,n,beta,standard_error,p_value,neg_log_10_p_value,info,odds_ratio,hazard_ratio,ci_lower,ci_upper 2024/12/20 12:46:28 -Output path: ./mysumstats_maf005.ssf.tsv.gz 2024/12/20 12:46:28 -Output columns: chromosome,base_pair_location,effect_allele,other_allele,beta,standard_error,effect_allele_frequency,p_value,variant_id,n 2024/12/20 12:46:28 -Writing sumstats to: ./mysumstats_maf005.ssf.tsv.gz... 2024/12/20 12:46:28 -Fast to csv mode... 2024/12/20 12:46:28 -md5sum hashing for the file: ./mysumstats_maf005.ssf.tsv.gz 2024/12/20 12:46:28 -md5sum path: ./mysumstats_maf005.ssf.tsv.gz.md5sum 2024/12/20 12:46:28 -md5sum: 76c4dd440eec447fbeae45a22b75f3e2 2024/12/20 12:46:28 -Exporting SSF-style meta data to ./mysumstats_maf005.ssf.tsv.gz.ssf.tsv-meta.yaml 2024/12/20 12:46:28 -Saving log file to: ./mysumstats_maf005.ssf.log 2024/12/20 12:46:28 Finished outputting successfully!