Genetic ancestry, ethnicity & race¶
Terms
Definition
In GWAS and cohort genomics, these labels answer different questions: genetic ancestry is inferred from DNA (proportions or axes vs reference panels); ethnicity and race are usually self-report or administrative social identities. They correlate in some populations because of history and sampling, but they are not interchangeable - using one as a proxy for the other misstates what was measured.
How they differ¶
| Ancestry | Ethnicity | Race | |
|---|---|---|---|
| What it is | Genetic: how much of the genome is attributed to specified source populations or positions along axes of allele-frequency variation (e.g. PCA, supervised assignment, local ancestry). | Social / cultural identity—language, nationality, migration history, community affiliation—often self-described. | Socially defined category, historically tied to racism and power; often collected as self-report or observer assignment in biomedical settings. |
| Typical source | Genotypes + a reference panel or model. | Questionnaires, registries, EHR fields. | Questionnaires, registries, EHR fields; reporting standards (e.g. NIH) may require race/ethnicity reporting separately from genetics. |
| GWAS use | Control population stratification, choose analysis models, interpret portability of scores across groups; not a moral or social label by itself. | Recruitment equity, harmonization across sites, reporting—not a substitute for genotype-based ancestry adjustment. | Same cautions as ethnicity; must not be treated as a biological proxy for allele frequencies or causation. |
Rule of thumb: If the method section says PCA, admixture, or projection onto 1000 Genomes, the authors mean genetic ancestry. If the table lists “White,” “Black,” “Asian,” “Hispanic,” check whether that field is self-report (race/ethnicity) vs genetic inference—and never assume they match individual-level ancestry.
Other meanings (optional)¶
- Cohort labels: Studies sometimes use coarse population or super-population codes (e.g. reference-panel abbreviations) that are not the same as race or ethnicity; read the data dictionary. See 1000 Genomes Project and cohort-specific documentation.
- Ancestry in plain English: Outside genetics, “ancestry” can mean genealogy or family history without genotyping; in this dictionary the ancestry entry is genetic ancestry unless context says otherwise.
Related terms¶
- Population stratification
- Admixture
- Trans-ancestry GWAS
- Polygenic risk score portability
- Ancestry informative marker
References¶
- NIH: NOT-OD-15-089 (collection of sex/gender, race, ethnicity, and age in clinical research).
- GWASTutorial: Population stratification.