Skip to content

GWASTutorial

image

Note: this tutorial is being updated to Version 2024

This Github page aims to provide a hands-on tutorial on common analysis in Complex Trait Genomics. This tutorial is designed for the course Fundamental Exercise II provided by The Laboratory of Complex Trait Genomics at the University of Tokyo. For more information, please see About.

This tutorial covers the minimum skills and knowledge required to perform a typical genome-wide association study (GWAS). The contents are categorized into the following groups. Additionally, for absolute beginners, we also prepared a section on command lines in Linux.

If you have any questions or suggestions, please feel free to let us know in the Issue section of this repository.

image

Contents

Command lines

Pre-GWAS

  • Data formats : Before any analysis, the first thing is always to get familiar with your data. In this section, we will introduce some basic formats used to store sequence, genotype and dosage data.
  • Data QC : Usually the raw genotype data is "dirty". This means that there are usually errors, invalid or missing values. In this section, we will learn how to perform quality control for the raw genotype data using PLINK.
  • Principal component analysis (PCA) : In this section, we will cover how to perform Principal Component Analysis (PCA) to analyze the population structure.  

GWAS

  • Association tests: After QC, we will perform the very first association tests for a simulated binary trait (case-control trait) with a logistic regression model using PLINK.
  • Visualization: To visualize the summary statistics generated from association tests, we will use a python package called gwaslab to create Manhattan plots, Quantitle-Quantile plots and Regional plots.

Post-GWAS

In these sections, we will briefly introduce the Post-GWAS analyses, which will dig deeper into the GWAS summary statistics.  

Topics

Introductions on GWAS-related issues

Others

Hits