Skip to content

KV Cache

Definition
AI-generated

A KV cache (key–value cache) stores, for each decoder layer and attention head, the key and value vectors already computed for past tokens during autoregressive generation.

Synonyms

Why it matters in GWAS

KV caching is what makes interactive LLM use (drafting methods text, parsing trait strings, agent tool calling) practical at scale: it cuts latency and memory bandwidth during long outputs or multi-turn chats. Teams benchmarking on-prem or regulated deployments should treat cache behavior (length, batching, quantization of cached tensors) as part of the reproducibility story alongside model weights and prompts.

Example usage

"We profiled wall-clock time per locus summary with KV cache enabled versus disabled on the same GPU."

References

  • Vaswani A, et al. (2017). Attention is all you need. NeurIPS.

Last updated (UTC · Git history)