Pretraining¶
Definition
AI-generated
Pretraining is the first, large-scale training phase where a model learns general representations from broad data—next-token prediction for text, masked modeling, contrastive image objectives, or denoising autoencoding—before fine-tuning or prompting on a downstream task.
Why it matters in GWAS¶
DNA, protein, and single-cell foundation models rely on pretraining corpora and tokenization choices that shape inductive bias; downstream GWAS interpretation still requires association study design independent of the model’s pretraining domain.
Example usage¶
"The methods explicitly include Pretraining to support interpretation of the main findings."
Related terms¶
References¶
- Devlin J, et al. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. NAACL.
- Brown T, et al. (2020). Language models are few-shot learners. NeurIPS.
Last updated (UTC · Git history)