Embedding¶

Definition

AI-generated

An embedding is a learned, fixed- or variable-length vector (or low-rank factor) representing a discrete token, span, image patch, cell, or other object so that semantically similar items map near each other in the space; modern transformers compute contextual embeddings that depend on surrounding context.

Topics

LLM and Agents Machine learning concepts

Why it matters in GWAS¶

Sequence and single-cell models use embeddings as inputs to heads for variant scoring or cell typing; GWAS interpretability usually requires independent association evidence because embedding geometry can absorb confounding or batch signal.

Example usage¶

"The variant window was tokenized into k-mers and passed through a pretrained DNA encoder to obtain a pooled embedding for the MLP risk head."

References¶

Mikolov T, et al. (2013). Distributed representations of words and phrases and their compositionality. NeurIPS.
Devlin J, et al. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. NAACL.

← Electronic Health Records (EHRs) Encyclopedia of DNA Elements (ENCODE) →

Last updated 2026-04-05 (UTC · Git history)

Embedding¶

Why it matters in GWAS¶

Example usage¶

Related terms¶

References¶