Reinforcement Learning from Human Feedback (RLHF)¶

Definition

AI-generated

Reinforcement learning from human feedback (RLHF) aligns a pretrained language model by training a reward model on human preferences, then optimizing the policy—often with proximal policy updates on sampled outputs—so the assistant is more helpful, honest, and harmless according to labeler rankings.

Topics

LLM and Agents

Synonyms

Why it matters in GWAS¶

Instruction-tuned models used for curation or methods drafting inherit RLHF tradeoffs (verbosity, refusals, preference bias); critical genomic claims should still be checked against databases and statistics, not model politeness.

Example usage¶

"The RLHF-tuned assistant refused to estimate individual disease risk from a VCF paste, which we preferred over a speculative guess."

References¶

Ouyang L, et al. (2022). Training language models to follow instructions with human feedback. NeurIPS.
Christiano PF, et al. (2017). Deep reinforcement learning from human preferences. NeurIPS.

← Risk Stratification RNA Sequencing (RNA-seq) →

Last updated 2026-03-31 (UTC · Git history)

Reinforcement Learning from Human Feedback (RLHF)¶

Why it matters in GWAS¶

Example usage¶

Related terms¶

References¶