Skip to content

Embedding

Definition
AI-generated

An embedding is a learned, fixed- or variable-length vector (or low-rank factor) representing a discrete token, span, image patch, cell, or other object so that semantically similar items map near each other in the space; modern transformers compute contextual embeddings that depend on surrounding context.

Why it matters in GWAS

Sequence and single-cell models use embeddings as inputs to heads for variant scoring or cell typing; GWAS interpretability usually requires independent association evidence because embedding geometry can absorb confounding or batch signal.

Example usage

"The variant window was tokenized into k-mers and passed through a pretrained DNA encoder to obtain a pooled embedding for the MLP risk head."

References

  • Mikolov T, et al. (2013). Distributed representations of words and phrases and their compositionality. NeurIPS.
  • Devlin J, et al. (2019). BERT: pre-training of deep bidirectional transformers for language understanding. NAACL.

Last updated (UTC · Git history)