Liftover¶
In [1]:
Copied!
import gwaslab as gl
import gwaslab as gl
Load sample data¶
In [2]:
Copied!
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
nrows=500000,
verbose=False)
mysumstats.basic_check(verbose=False)
mysumstats = gl.Sumstats("t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
nrows=500000,
verbose=False)
mysumstats.basic_check(verbose=False)
In [3]:
Copied!
mysumstats.data
mysumstats.data
Out[3]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1:725932_G_A | 1 | 725932 | G | A | 0.9960 | -0.0737 | 0.1394 | 0.597000 | 9960099 |
1 | 1:725933_A_G | 1 | 725933 | G | A | 0.0040 | 0.0737 | 0.1394 | 0.597300 | 9960099 |
2 | 1:737801_T_C | 1 | 737801 | C | T | 0.0051 | 0.0490 | 0.1231 | 0.690800 | 9960099 |
3 | 1:749963_T_TAA | 1 | 749963 | TAA | T | 0.8374 | 0.0213 | 0.0199 | 0.284600 | 9960399 |
4 | 1:751343_T_A | 1 | 751343 | T | A | 0.8593 | 0.0172 | 0.0156 | 0.270500 | 9960099 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
499995 | 1:116768304_A_G | 1 | 116768304 | G | A | 0.0020 | 0.0881 | 0.1312 | 0.502100 | 9960099 |
499996 | 1:116768310_T_C | 1 | 116768310 | C | T | 0.0059 | 0.0575 | 0.0584 | 0.324800 | 9960099 |
499997 | 1:116768460_G_A | 1 | 116768460 | G | A | 0.9896 | 0.1191 | 0.0442 | 0.006983 | 9960099 |
499998 | 1:116768479_T_A | 1 | 116768479 | T | A | 0.9003 | 0.0067 | 0.0149 | 0.652700 | 9960099 |
499999 | 1:116768615_TA_T | 1 | 116768615 | TA | T | 0.6184 | -0.0121 | 0.0094 | 0.198500 | 9960399 |
500000 rows × 10 columns
Liftover¶
In [4]:
Copied!
mysumstats.liftover(n_cores=3, from_build="19", to_build="38")
mysumstats.liftover(n_cores=3, from_build="19", to_build="38")
Sat Feb 3 00:39:42 2024 Start to perform liftover...v3.4.38 Sat Feb 3 00:39:42 2024 -Current Dataframe shape : 500000 x 10 ; Memory usage: 49.83 MB Sat Feb 3 00:39:42 2024 -Number of threads/cores to use: 3 Sat Feb 3 00:39:42 2024 -Creating converter : hg19 to hg38 Sat Feb 3 00:39:42 2024 -Converting variants with status code xxx0xxx :500000... Sat Feb 3 00:40:18 2024 -Removed unmapped variants: 114 Sat Feb 3 00:40:18 2024 Start to fix chromosome notation (CHR)...v3.4.38 Sat Feb 3 00:40:18 2024 -Current Dataframe shape : 499886 x 10 ; Memory usage: 53.64 MB Sat Feb 3 00:40:18 2024 -Checking CHR data type... Sat Feb 3 00:40:20 2024 -Variants with standardized chromosome notation: 499886 Sat Feb 3 00:40:20 2024 -All CHR are already fixed... Sat Feb 3 00:40:22 2024 Finished fixing chromosome notation (CHR). Sat Feb 3 00:40:22 2024 Start to fix basepair positions (POS)...v3.4.38 Sat Feb 3 00:40:22 2024 -Current Dataframe shape : 499886 x 10 ; Memory usage: 53.64 MB Sat Feb 3 00:40:22 2024 -Converting to Int64 data type ... Sat Feb 3 00:40:23 2024 -Position bound:(0 , 250,000,000) Sat Feb 3 00:40:24 2024 -Removed outliers: 0 Sat Feb 3 00:40:24 2024 -Removed 0 variants with bad positions. Sat Feb 3 00:40:24 2024 Finished fixing basepair positions (POS). Sat Feb 3 00:40:24 2024 Finished liftover.
In [5]:
Copied!
mysumstats.data
mysumstats.data
Out[5]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1:725932_G_A | 1 | 790552 | G | A | 0.9960 | -0.0737 | 0.1394 | 0.597000 | 3860099 |
1 | 1:725933_A_G | 1 | 790553 | G | A | 0.0040 | 0.0737 | 0.1394 | 0.597300 | 3860099 |
2 | 1:737801_T_C | 1 | 802421 | C | T | 0.0051 | 0.0490 | 0.1231 | 0.690800 | 3860099 |
3 | 1:749963_T_TAA | 1 | 814583 | TAA | T | 0.8374 | 0.0213 | 0.0199 | 0.284600 | 3860399 |
4 | 1:751343_T_A | 1 | 815963 | T | A | 0.8593 | 0.0172 | 0.0156 | 0.270500 | 3860099 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
499995 | 1:116768304_A_G | 1 | 116225682 | G | A | 0.0020 | 0.0881 | 0.1312 | 0.502100 | 3860099 |
499996 | 1:116768310_T_C | 1 | 116225688 | C | T | 0.0059 | 0.0575 | 0.0584 | 0.324800 | 3860099 |
499997 | 1:116768460_G_A | 1 | 116225838 | G | A | 0.9896 | 0.1191 | 0.0442 | 0.006983 | 3860099 |
499998 | 1:116768479_T_A | 1 | 116225857 | T | A | 0.9003 | 0.0067 | 0.0149 | 0.652700 | 3860099 |
499999 | 1:116768615_TA_T | 1 | 116225993 | TA | T | 0.6184 | -0.0121 | 0.0094 | 0.198500 | 3860399 |
499886 rows × 10 columns