Skip to content

Pretraining

Definition
AI-generated

Pretraining is the first, large-scale training phase where a model learns general representations from broad data—next-token prediction for text, masked modeling, contrastive image objectives, or denoising autoencoding—before fine-tuning or prompting on a downstream task.

Why it matters in GWAS

DNA, protein, and single-cell foundation models rely on pretraining corpora and tokenization choices that shape inductive bias; downstream GWAS interpretation still requires association study design independent of the model’s pretraining domain.

Example usage

"The methods explicitly include Pretraining to support interpretation of the main findings."

References

  • Devlin J, et al. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. NAACL.
  • Brown T, et al. (2020). Language models are few-shot learners. NeurIPS.

Last updated (UTC · Git history)