Liftover¶
In [1]:
Copied!
import gwaslab as gl
import gwaslab as gl
In [2]:
Copied!
gl.show_version()
gl.show_version()
2024/12/21 18:01:36 GWASLab v3.5.4 https://cloufield.github.io/gwaslab/ 2024/12/21 18:01:36 (C) 2022-2024, Yunye He, Kamatani Lab, MIT License, gwaslab@gmail.com
Load sample data¶
In [3]:
Copied!
mysumstats = gl.Sumstats("../0_sample_data/t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
nrows=5000,
verbose=False)
mysumstats.basic_check(verbose=False)
mysumstats = gl.Sumstats("../0_sample_data/t2d_bbj.txt.gz",
snpid="SNP",
chrom="CHR",
pos="POS",
ea="ALT",
nea="REF",
neaf="Frq",
beta="BETA",
se="SE",
p="P",
nrows=5000,
verbose=False)
mysumstats.basic_check(verbose=False)
In [4]:
Copied!
mysumstats.data
mysumstats.data
Out[4]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1:725932_G_A | 1 | 725932 | G | A | 0.9960 | -0.0737 | 0.1394 | 0.5970 | 9960099 |
1 | 1:725933_A_G | 1 | 725933 | G | A | 0.0040 | 0.0737 | 0.1394 | 0.5973 | 9960099 |
2 | 1:737801_T_C | 1 | 737801 | C | T | 0.0051 | 0.0490 | 0.1231 | 0.6908 | 9960099 |
3 | 1:749963_T_TAA | 1 | 749963 | TAA | T | 0.8374 | 0.0213 | 0.0199 | 0.2846 | 9960399 |
4 | 1:751343_T_A | 1 | 751343 | T | A | 0.8593 | 0.0172 | 0.0156 | 0.2705 | 9960099 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4995 | 1:2150310_G_C | 1 | 2150310 | G | C | 0.9913 | 0.0429 | 0.0528 | 0.4172 | 9960099 |
4996 | 1:2150317_G_A | 1 | 2150317 | G | A | 0.9930 | 0.1235 | 0.0754 | 0.1015 | 9960099 |
4997 | 1:2150581_C_T | 1 | 2150581 | C | T | 0.9981 | 0.1502 | 0.1271 | 0.2371 | 9960099 |
4998 | 1:2151290_AAAAAC_A | 1 | 2151290 | AAAAAC | A | 0.8980 | 0.0158 | 0.0152 | 0.2977 | 9960399 |
4999 | 1:2151579_C_A | 1 | 2151579 | C | A | 0.8980 | 0.0156 | 0.0152 | 0.3028 | 9960099 |
5000 rows × 10 columns
Liftover¶
In [5]:
Copied!
mysumstats.liftover(n_cores=3, from_build="19", to_build="38")
mysumstats.liftover(n_cores=3, from_build="19", to_build="38")
2024/12/21 18:01:50 Start to perform liftover...v3.5.4 2024/12/21 18:01:50 -Current Dataframe shape : 5000 x 10 ; Memory usage: 21.77 MB 2024/12/21 18:01:50 -Number of threads/cores to use: 3 2024/12/21 18:01:50 -Creating converter using provided ChainFile: /home/yunye/.gwaslab/hg19ToHg38.over.chain.gz 2024/12/21 18:01:50 -Creating converter : 19 -> 38 2024/12/21 18:01:51 -Converting variants with status code xxx0xxx :5000... 2024/12/21 18:01:52 -Removed unmapped variants: 2 2024/12/21 18:01:52 Start to fix chromosome notation (CHR)...v3.5.4 2024/12/21 18:01:52 -Current Dataframe shape : 4998 x 10 ; Memory usage: 0.36 MB 2024/12/21 18:01:52 -Checking CHR data type... 2024/12/21 18:01:52 -Variants with standardized chromosome notation: 4998 2024/12/21 18:01:52 -All CHR are already fixed... 2024/12/21 18:01:53 Finished fixing chromosome notation (CHR). 2024/12/21 18:01:53 Start to fix basepair positions (POS)...v3.5.4 2024/12/21 18:01:53 -Current Dataframe shape : 4998 x 10 ; Memory usage: 21.81 MB 2024/12/21 18:01:53 -Removing thousands separator "," or underbar "_" ... 2024/12/21 18:01:53 -Converting to Int64 data type ... 2024/12/21 18:01:56 -Position bound:(0 , 250,000,000) 2024/12/21 18:01:56 -Removed outliers: 0 2024/12/21 18:01:56 -Removed 0 variants with bad positions. 2024/12/21 18:01:56 Finished fixing basepair positions (POS). 2024/12/21 18:01:56 Finished liftover.
In [6]:
Copied!
mysumstats.data
mysumstats.data
Out[6]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1:725932_G_A | 1 | 790552 | G | A | 0.9960 | -0.0737 | 0.1394 | 0.5970 | 3860099 |
1 | 1:725933_A_G | 1 | 790553 | G | A | 0.0040 | 0.0737 | 0.1394 | 0.5973 | 3860099 |
2 | 1:737801_T_C | 1 | 802421 | C | T | 0.0051 | 0.0490 | 0.1231 | 0.6908 | 3860099 |
3 | 1:749963_T_TAA | 1 | 814583 | TAA | T | 0.8374 | 0.0213 | 0.0199 | 0.2846 | 3860399 |
4 | 1:751343_T_A | 1 | 815963 | T | A | 0.8593 | 0.0172 | 0.0156 | 0.2705 | 3860099 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4995 | 1:2150310_G_C | 1 | 2218871 | G | C | 0.9913 | 0.0429 | 0.0528 | 0.4172 | 3860099 |
4996 | 1:2150317_G_A | 1 | 2218878 | G | A | 0.9930 | 0.1235 | 0.0754 | 0.1015 | 3860099 |
4997 | 1:2150581_C_T | 1 | 2219142 | C | T | 0.9981 | 0.1502 | 0.1271 | 0.2371 | 3860099 |
4998 | 1:2151290_AAAAAC_A | 1 | 2219851 | AAAAAC | A | 0.8980 | 0.0158 | 0.0152 | 0.2977 | 3860399 |
4999 | 1:2151579_C_A | 1 | 2220140 | C | A | 0.8980 | 0.0156 | 0.0152 | 0.3028 | 3860099 |
4998 rows × 10 columns
Liftover using user-provided chain¶
In [7]:
Copied!
#https://github.com/marbl/CHM13
#https://github.com/marbl/CHM13
In [8]:
Copied!
! wget https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chain/v1_nflo/grch38-chm13v2.chain
! wget https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chain/v1_nflo/grch38-chm13v2.chain
--2024-12-21 18:01:56-- https://s3-us-west-2.amazonaws.com/human-pangenomics/T2T/CHM13/assemblies/chain/v1_nflo/grch38-chm13v2.chain Resolving s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)... 52.92.228.232, 3.5.76.241, 52.218.216.160, ... Connecting to s3-us-west-2.amazonaws.com (s3-us-west-2.amazonaws.com)|52.92.228.232|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 6288201 (6.0M) [binary/octet-stream] Saving to: ‘grch38-chm13v2.chain.1’ grch38-chm13v2.chai 100%[===================>] 6.00M 2.01MB/s in 3.0s 2024-12-21 18:02:00 (2.01 MB/s) - ‘grch38-chm13v2.chain.1’ saved [6288201/6288201]
In [9]:
Copied!
mysumstats.data
mysumstats.data
Out[9]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1:725932_G_A | 1 | 790552 | G | A | 0.9960 | -0.0737 | 0.1394 | 0.5970 | 3860099 |
1 | 1:725933_A_G | 1 | 790553 | G | A | 0.0040 | 0.0737 | 0.1394 | 0.5973 | 3860099 |
2 | 1:737801_T_C | 1 | 802421 | C | T | 0.0051 | 0.0490 | 0.1231 | 0.6908 | 3860099 |
3 | 1:749963_T_TAA | 1 | 814583 | TAA | T | 0.8374 | 0.0213 | 0.0199 | 0.2846 | 3860399 |
4 | 1:751343_T_A | 1 | 815963 | T | A | 0.8593 | 0.0172 | 0.0156 | 0.2705 | 3860099 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4995 | 1:2150310_G_C | 1 | 2218871 | G | C | 0.9913 | 0.0429 | 0.0528 | 0.4172 | 3860099 |
4996 | 1:2150317_G_A | 1 | 2218878 | G | A | 0.9930 | 0.1235 | 0.0754 | 0.1015 | 3860099 |
4997 | 1:2150581_C_T | 1 | 2219142 | C | T | 0.9981 | 0.1502 | 0.1271 | 0.2371 | 3860099 |
4998 | 1:2151290_AAAAAC_A | 1 | 2219851 | AAAAAC | A | 0.8980 | 0.0158 | 0.0152 | 0.2977 | 3860399 |
4999 | 1:2151579_C_A | 1 | 2220140 | C | A | 0.8980 | 0.0156 | 0.0152 | 0.3028 | 3860099 |
4998 rows × 10 columns
In [10]:
Copied!
mysumstats.liftover(n_cores=1, from_build="38", to_build="13",chain="./grch38-chm13v2.chain")
mysumstats.liftover(n_cores=1, from_build="38", to_build="13",chain="./grch38-chm13v2.chain")
2024/12/21 18:02:00 Start to perform liftover...v3.5.4 2024/12/21 18:02:00 -Current Dataframe shape : 4998 x 10 ; Memory usage: 21.81 MB 2024/12/21 18:02:00 -Number of threads/cores to use: 1 2024/12/21 18:02:00 -Creating converter using ChainFile: ./grch38-chm13v2.chain 2024/12/21 18:02:00 -Creating converter : 38 -> 13 2024/12/21 18:02:01 -Converting variants with status code xxx0xxx :4998... 2024/12/21 18:02:03 -Removed unmapped variants: 44 2024/12/21 18:02:03 Start to fix chromosome notation (CHR)...v3.5.4 2024/12/21 18:02:03 -Current Dataframe shape : 4954 x 10 ; Memory usage: 0.36 MB 2024/12/21 18:02:03 -Checking CHR data type... 2024/12/21 18:02:03 -Variants with standardized chromosome notation: 4954 2024/12/21 18:02:03 -All CHR are already fixed... 2024/12/21 18:02:04 Finished fixing chromosome notation (CHR). 2024/12/21 18:02:04 Start to fix basepair positions (POS)...v3.5.4 2024/12/21 18:02:04 -Current Dataframe shape : 4954 x 10 ; Memory usage: 21.80 MB 2024/12/21 18:02:04 -Removing thousands separator "," or underbar "_" ... 2024/12/21 18:02:04 -Converting to Int64 data type ... 2024/12/21 18:02:07 -Position bound:(0 , 250,000,000) 2024/12/21 18:02:07 -Removed outliers: 0 2024/12/21 18:02:07 -Removed 0 variants with bad positions. 2024/12/21 18:02:07 Finished fixing basepair positions (POS). 2024/12/21 18:02:07 Finished liftover.
In [11]:
Copied!
mysumstats.data
mysumstats.data
Out[11]:
SNPID | CHR | POS | EA | NEA | EAF | BETA | SE | P | STATUS | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1:725932_G_A | 1 | 219468 | G | A | 0.9960 | -0.0737 | 0.1394 | 0.5970 | 1360099 |
1 | 1:725933_A_G | 1 | 219469 | G | A | 0.0040 | 0.0737 | 0.1394 | 0.5973 | 1360099 |
2 | 1:737801_T_C | 1 | 231327 | C | T | 0.0051 | 0.0490 | 0.1231 | 0.6908 | 1360099 |
3 | 1:749963_T_TAA | 1 | 243588 | TAA | T | 0.8374 | 0.0213 | 0.0199 | 0.2846 | 1360399 |
4 | 1:751343_T_A | 1 | 244969 | T | A | 0.8593 | 0.0172 | 0.0156 | 0.2705 | 1360099 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
4995 | 1:2150310_G_C | 1 | 1654280 | G | C | 0.9913 | 0.0429 | 0.0528 | 0.4172 | 1360099 |
4996 | 1:2150317_G_A | 1 | 1654287 | G | A | 0.9930 | 0.1235 | 0.0754 | 0.1015 | 1360099 |
4997 | 1:2150581_C_T | 1 | 1654551 | C | T | 0.9981 | 0.1502 | 0.1271 | 0.2371 | 1360099 |
4998 | 1:2151290_AAAAAC_A | 1 | 1655260 | AAAAAC | A | 0.8980 | 0.0158 | 0.0152 | 0.2977 | 1360399 |
4999 | 1:2151579_C_A | 1 | 1655549 | C | A | 0.8980 | 0.0156 | 0.0152 | 0.3028 | 1360099 |
4954 rows × 10 columns