Skip to content

Variant Call Format (VCF)

Definition
AI-generated

VCF is a text-based standard for representing genetic variants, their alleles, site-level metadata, and per-sample genotype fields.

Topics

Why it matters in GWAS

VCF is a common interchange format for sequencing, array, and imputation pipelines, so many GWAS workflows begin with VCF and convert onward to PLINK, BGEN, or PGEN. Correct handling of reference allele, coordinate convention, multiallelic sites, and genotype representation is essential for harmonization and reproducibility.

Example usage

"We normalized and indexed the cohort VCF before conversion to PLINK and imputation against the reference panel."

References

  • Danecek P, et al. (2011). The variant call format and VCFtools. Bioinformatics. https://doi.org/10.1093/bioinformatics/btr330
  • HTS specifications: https://samtools.github.io/hts-specs/
  • GWASTutorial: Data formats.

Last updated (UTC · Git history)