Correlation heatmap¶

In [1]:

            
                Copied!
                
import gwaslab as gl
import pandas as pd
import gwaslab as gl
import pandas as pd

Load sample data¶

Sample data source:

Kanai, M., Akiyama, M., Takahashi, A., Matoba, N., Momozawa, Y., Ikeda, M., ... & Kamatani, Y. (2018). Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases. Nature genetics, 50(3), 390-400.

In [2]:

            
                Copied!
                
ldsc = pd.read_csv("toy_data/input_rg.txt",sep="\t")
ldsc
ldsc = pd.read_csv("toy_data/input_rg.txt",sep="\t")
ldsc

Out[2]:

	p1_category	p1	p2_category	p2	rg	se	z	p	q
0	Anthropometric	Height	Anthropometric	BMI	-0.0587	0.0240	-2.4421	0.014602	0.136798
1	Anthropometric	Height	Metabolic	TC	-0.0778	0.0344	-2.2634	0.023611	0.188695
2	Anthropometric	Height	Metabolic	HDL-C	-0.0045	0.0364	-0.1230	0.902080	0.971590
3	Anthropometric	Height	Metabolic	LDL-C	-0.1245	0.0426	-2.9228	0.003469	0.050126
4	Anthropometric	Height	Metabolic	TG	-0.0426	0.0309	-1.3792	0.167820	0.523652
...	...	...	...	...	...	...	...	...	...
3911	Other	Pollinosis	Tumor	PrCa	0.3250	0.1433	2.2683	0.023300	0.187944
3912	Other	Pollinosis	Tumor	UF	0.2163	0.1756	1.2315	0.218100	0.596616
3913	Other	Pollinosis	Other	Urolithiasis	0.0664	0.1794	0.3701	0.711300	0.903018
3914	Other	Urolithiasis	Allergic disease	AD	-0.0821	0.1592	-0.5159	0.605890	0.863730
3915	Other	Urolithiasis	Tumor	UF	0.1980	0.1292	1.5324	0.125410	0.456418

3916 rows × 9 columns

Full heatmap¶

In [3]:

            
                Copied!
                
                    
                    
                
                

        
df = gl.plot_rg( ldsc,
            p="q",
            p1="p2",
            p2="p1", 
            sig_levels=[0.05],
            corrections =["non"],
            full_cell=("non",0.05),
            panno_texts=["*"],
            panno_args={"size":12,"c":"black"},
            fig_args={"figsize":(15,15),"dpi":300},
            colorbar_args={"shrink":0.4},
            fontsize=8
            )
df = gl.plot_rg( ldsc,
            p="q",
            p1="p2",
            p2="p1", 
            sig_levels=[0.05],
            corrections =["non"],
            full_cell=("non",0.05),
            panno_texts=["*"],
            panno_args={"size":12,"c":"black"},
            fig_args={"figsize":(15,15),"dpi":300},
            colorbar_args={"shrink":0.4},
            fontsize=8
            )

Sun Feb  4 17:51:33 2024 Start to create ldsc genetic correlation heatmap...
Sun Feb  4 17:51:33 2024 Raw dataset records: 3916
Sun Feb  4 17:51:33 2024  -Raw dataset non-NA records: 3916
Sun Feb  4 17:51:33 2024 Filling diagnal line and duplicated pair for plotting...
Sun Feb  4 17:51:33 2024  -Diagnal records: 89

/home/yunye/anaconda3/lib/python3.9/site-packages/gwaslab/viz_plot_rg_heatmap.py:107: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df,df_fill_reverse,df_fill_dia,df_fill_na],ignore_index=True).sort_values(by=p).drop_duplicates(subset=[p1,p2])

Sun Feb  4 17:51:33 2024 Valid unique trait pairs: 3916
Sun Feb  4 17:51:33 2024  -Valid unique trait1: 88
Sun Feb  4 17:51:33 2024  -Valid unique trait2: 88
Sun Feb  4 17:51:33 2024  -Significant correlations with P < 0.05: 270
Sun Feb  4 17:51:33 2024  -Significant correlations after Bonferroni correction: 81
Sun Feb  4 17:51:33 2024  -Significant correlations with FDR <0.05: 127
Sun Feb  4 17:51:33 2024 Plotting heatmap...
Sun Feb  4 17:51:34 2024 Full cell : non-corrected P == 0.05
Sun Feb  4 17:51:41 2024 P value annotation text : 
Sun Feb  4 17:51:41 2024  -* : non-corrected P < 0.05
Sun Feb  4 17:51:41 2024 Start to save figure...
Sun Feb  4 17:51:41 2024  -Skip saving figure!
Sun Feb  4 17:51:41 2024 Finished saving figure...
Sun Feb  4 17:51:41 2024 Finished creating ldsc genetic correlation heatmap!

filter and order the data¶

In [4]:

            
                Copied!
                
# Order the data
trait =  pd.read_csv("toy_data/trait_list.txt",sep="\t")
trait["order"] = range(len(trait))
trait

# These steps are just used to order the traits

order = trait["TRAIT"].values

trait_set1 = trait.loc[trait["order"]>=59,"TRAIT"].values
trait_set2 = trait.loc[trait["order"]<59,"TRAIT"].values

ldsc = ldsc.loc[((ldsc["p1"].isin(trait_set1))&(ldsc["p2"].isin(trait_set2))) | ((ldsc["p1"].isin(trait_set2))&(ldsc["p2"].isin(trait_set1))),:]

# key for sort
map_dic={order[i]:i+1 for i in range(len(order))}
key=lambda x:x.map(map_dic)
# Order the data
trait =  pd.read_csv("toy_data/trait_list.txt",sep="\t")
trait["order"] = range(len(trait))
trait

# These steps are just used to order the traits

order = trait["TRAIT"].values

trait_set1 = trait.loc[trait["order"]>=59,"TRAIT"].values
trait_set2 = trait.loc[trait["order"]<59,"TRAIT"].values

ldsc = ldsc.loc[((ldsc["p1"].isin(trait_set1))&(ldsc["p2"].isin(trait_set2))) | ((ldsc["p1"].isin(trait_set2))&(ldsc["p2"].isin(trait_set1))),:]

# key for sort
map_dic={order[i]:i+1 for i in range(len(order))}
key=lambda x:x.map(map_dic)

In [5]:

            
                Copied!
                
print(map_dic)
print(map_dic)

{'Height': 1, 'BMI': 2, 'TC': 3, 'HDL-C': 4, 'LDL-C': 5, 'TG': 6, 'BS': 7, 'HbA1c': 8, 'TP': 9, 'Alb': 10, 'NAP': 11, 'A/G': 12, 'BUN': 13, 'sCr': 14, 'eGFR': 15, 'UA': 16, 'Na': 17, 'K': 18, 'Cl': 19, 'Ca': 20, 'P': 21, 'TBil': 22, 'ZTT': 23, 'AST': 24, 'ALT': 25, 'ALP': 26, 'GGT': 27, 'APTT': 28, 'PT': 29, 'Fbg': 30, 'CK': 31, 'LDH': 32, 'CRP': 33, 'WBC': 34, 'Neutro': 35, 'Eosino': 36, 'Baso': 37, 'Mono': 38, 'Lym': 39, 'RBC': 40, 'Hb': 41, 'Ht': 42, 'MCV': 43, 'MCH': 44, 'MCHC': 45, 'Plt': 46, 'SBP': 47, 'DBP': 48, 'MAP': 49, 'PP': 50, 'IVS': 51, 'PW': 52, 'LVDd': 53, 'LVDs': 54, 'LVM': 55, 'LVMI': 56, 'RWT': 57, 'FS': 58, 'EF': 59, 'T2D': 60, 'IS': 61, 'CeAn': 62, 'MI': 63, 'PAD': 64, 'AF': 65, 'Asthma': 66, 'AD': 67, 'GD': 68, 'RA': 69, 'CHB': 70, 'CHC': 71, 'Anemia': 72, 'BD': 73, 'SCZ': 74, 'AIS': 75, 'Osteoporosis': 76, 'LuCa': 77, 'GaCa': 78, 'EsCa': 79, 'CoCa': 80, 'PrCa': 81, 'BrCa': 82, 'EnCa': 83, 'UF': 84, 'Glaucoma': 85, 'COPD': 86, 'Epilepsy': 87, 'Pollinosis': 88, 'Urolithiasis': 89}

Replicate heatmap in paper¶

In [6]:

            
                Copied!
                
                    
                    
                
                

        
df = gl.plot_rg( ldsc,
            sig_levels=[0.05],
            corrections =["non"],
            p="q",
            p1="p2",
            p2="p1",
            full_cell=("non",0.05),
            panno_texts=["*"],
            fig_args={"figsize":(15,15),"dpi":300},
            colorbar_args={"shrink":0.4},
            panno_args={"size":12,"c":"black"},
            fontsize=8,
            sort_key=key
            )
df = gl.plot_rg( ldsc,
            sig_levels=[0.05],
            corrections =["non"],
            p="q",
            p1="p2",
            p2="p1",
            full_cell=("non",0.05),
            panno_texts=["*"],
            fig_args={"figsize":(15,15),"dpi":300},
            colorbar_args={"shrink":0.4},
            panno_args={"size":12,"c":"black"},
            fontsize=8,
            sort_key=key
            )

Sun Feb  4 17:51:43 2024 Start to create ldsc genetic correlation heatmap...
Sun Feb  4 17:51:43 2024 Raw dataset records: 1770
Sun Feb  4 17:51:43 2024  -Raw dataset non-NA records: 1770
Sun Feb  4 17:51:43 2024 Filling diagnal line and duplicated pair for plotting...
Sun Feb  4 17:51:43 2024 Valid unique trait pairs: 1770
Sun Feb  4 17:51:43 2024  -Valid unique trait1: 59
Sun Feb  4 17:51:43 2024  -Valid unique trait2: 30
Sun Feb  4 17:51:43 2024  -Significant correlations with P < 0.05: 68
Sun Feb  4 17:51:43 2024  -Significant correlations after Bonferroni correction: 13
Sun Feb  4 17:51:43 2024  -Significant correlations with FDR <0.05: 20
Sun Feb  4 17:51:43 2024 Plotting heatmap...

/home/yunye/anaconda3/lib/python3.9/site-packages/gwaslab/viz_plot_rg_heatmap.py:107: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  df = pd.concat([df,df_fill_reverse,df_fill_dia,df_fill_na],ignore_index=True).sort_values(by=p).drop_duplicates(subset=[p1,p2])

Sun Feb  4 17:51:44 2024 Full cell : non-corrected P == 0.05
Sun Feb  4 17:51:45 2024 P value annotation text : 
Sun Feb  4 17:51:45 2024  -* : non-corrected P < 0.05
Sun Feb  4 17:51:45 2024 Start to save figure...
Sun Feb  4 17:51:45 2024  -Skip saving figure!
Sun Feb  4 17:51:45 2024 Finished saving figure...
Sun Feb  4 17:51:45 2024 Finished creating ldsc genetic correlation heatmap!

In [ ]: