Skip to content

Duplicate marking

Definition
AI-generated

Duplicate marking is the step in a sequencing pipeline that labels read pairs (or reads) as duplicates—typically PCR or optical duplicates—so downstream steps such as variant calling treat them as a single observation of the same DNA fragment.

Why it matters in GWAS

Duplicate marking reduces inflated read support at sites and stabilizes depth-based QC; omitting it can bias allele-balance and variant-quality statistics used before association testing.

Example usage

"Duplicate marking was performed with Picard MarkDuplicates; marked duplicates were excluded from variant discovery."

References

  • Picard toolkit: MarkDuplicates. Broad Institute.

Last updated (UTC · Git history)