GWASTutorial
Note: this tutorial is being updated to Version 2024
This Github page aims to provide a hands-on tutorial on common analysis in Complex Trait Genomics. This tutorial is designed for the course Fundamental Exercise II
provided by The Laboratory of Complex Trait Genomics at the University of Tokyo. For more information, please see About.
This tutorial covers the minimum skills and knowledge required to perform a typical genome-wide association study (GWAS). The contents are categorized into the following groups. Additionally, for absolute beginners, we also prepared a section on command lines in Linux.
If you have any questions or suggestions, please feel free to let us know in the Issue section of this repository.
Contents
Command lines
- Linux command line basics (optional) : For those who haven't used the command line, we will first introduce the basics of the Linux system and commonly used commands.
Pre-GWAS
- Data formats : Before any analysis, the first thing is always to get familiar with your data. In this section, we will introduce some basic formats used to store sequence, genotype and dosage data.
- Data QC : Usually the raw genotype data is "dirty". This means that there are usually errors, invalid or missing values. In this section, we will learn how to perform quality control for the raw genotype data using PLINK.
- Principal component analysis (PCA) : In this section, we will cover how to perform Principal Component Analysis (PCA) to analyze the population structure.
GWAS
- Association tests: After QC, we will perform the very first association tests for a simulated binary trait (case-control trait) with a logistic regression model using PLINK.
- Visualization: To visualize the summary statistics generated from association tests, we will use a python package called gwaslab to create Manhattan plots, Quantitle-Quantile plots and Regional plots.
Post-GWAS
In these sections, we will briefly introduce the Post-GWAS analyses, which will dig deeper into the GWAS summary statistics.
- Variant Annotation by ANNOVAR/VEP
- Heritability Concepts
- SNP-Heritability estimation by GCTA-GREML
- LD score regression (univariate, cross-trait and partitioned) by LDSC
- Gene / Gene-set analysis by MAGMA
- Fine-mapping by SUSIE
- Polygenic risk scores
- Colocalization
- TWAS
Topics
Introductions on GWAS-related issues