0-based vs 1-based Genome Coordinate Systems
Genomic coordinates specify positions along a chromosome.
Different tools use different conventions, which is a common source of off-by-one errors.
This guide explains:
- what 0-based and 1-based coordinates mean
- where each system is used
- how to convert safely between them
What are Genomic Coordinates?
Genomic coordinates are typically specified as a set of:
- Chromosome name (e.g.,
chr1,chr2,chrX) - Start position (the beginning of the feature)
- End position (the end of the feature)
- Strand (
+for forward strand,-for reverse strand,.for unstranded features)
Start and End Coordinates
In the vast majority of annotation formats, the start coordinate refers to the lowest-numbered (leftmost) coordinate relative to the genome, not relative to the feature itself. This means:
- For forward-strand features: start is the 5' end, end is the 3' end
- For reverse-strand features: start is actually the 3' end (leftmost on the chromosome), end is the 5' end (rightmost on the chromosome)
1-based Coordinates (Human-friendly)
Definition
- First nucleotide is 1
- Both start and end are included →
[start, end]
Example
chr1:1-10
- Includes bases 1 through 10
- Length = 10 bases
Common formats:
- VCF
- GFF / GTF
- Ensembl
- UCSC web interface
- GWAS Sumstats
GWAS Summary Statistics
GWAS summary statistics are typically 1-based because they are derived from genotype data stored in VCF files (which use 1-based coordinates) or related formats like PGEN/BGEN/BED.
0-based Coordinates (Computer-friendly)
Definition
- First nucleotide is 0
- Start included, end excluded →
[start, end)
Example
chr1 0 10
- Includes bases 0 through 9 (which correspond to bases 1-10 in 1-based)
- Length =
10 − 0 = 10
Format notation varies by coordinate system:
- 1-based formats typically use colon-dash notation:
chr1:1-10(represents bases 1-10) - 0-based formats often use space-separated notation (e.g., UCSC BED format):
chr1 0 10(represents bases 1-10, positions 0-9 in 0-based)
While the notation differs, both examples above describe the same 10-base genomic region. The key is understanding which coordinate system (0-based or 1-based) is being used by the format, as this determines how to interpret the coordinates.
Common formats:
- BED (not Plink bed format)
- BAM
- UCSC tables
- UCSC chain files
Four Possible Coordinate Representations
Because coordinate systems can be 0-indexed or 1-indexed, and half-open or fully-closed, genomic features can be represented in four possible ways:
| Half-open | Fully-closed |
|---|---|
| 0-indexed | start: 11, end: 17 |
| 1-indexed | start: 12, end: 18 |
For example, a 6-base feature starting at the 12th nucleotide:
- 0-indexed, half-open: [11, 17) → length = 17 - 11 = 6
- 0-indexed, fully-closed: [11, 16] → length = 16 - 11 + 1 = 6
- 1-indexed, half-open: [12, 18) → length = 18 - 12 = 6
- 1-indexed, fully-closed: [12, 17] → length = 17 - 12 + 1 = 6
All four representations describe the same biological feature, just using different counting conventions.
Comparison between 1-based and 0-based
The following table summarizes the key differences between 1-based and 0-based coordinate systems:
| Aspect | 1-based | 0-based |
|---|---|---|
| First base | 1 | 0 |
| Interval type | Fully closed [start, end] |
Half-open [start, end) |
| Start included | Yes | Yes |
| End included | Yes | No |
| Length formula | end − start + 1 |
end − start |
| Mental model | Counting bases | Array indices / gaps |
1-based coordinates label the nucleotides themselves, while 0-based coordinates label the positions between nucleotides (gaps).
Key insights:
- Every base is flanked by two gaps
- There is a gap before the first base and after the last base
- This is why 0-based intervals exclude the end position and are written as half-open
[start, end)
Representations with a Concrete Sequence
| Sequence | A | T | G | C | A | ||||||
| Type | Gap | Base | Gap | Base | Gap | Base | Gap | Base | Gap | Base | Gap |
| 1-based | 1 | 2 | 3 | 4 | 5 | ||||||
| 0-based | 0 | 1 | 2 | 3 | 4 | 5 |
Comparison Table (Using the Same Sequence)
| Feature Type | Example | 1-based Representation | 0-based Representation | Selected Bases |
|---|---|---|---|---|
| Single nucleotide | Base at position 3 | chr1:3-3 |
chr1 2 3 |
G |
| Range | Bases 2–4 | chr1:2-4 |
chr1 1 4 |
T G C |
| SNP | G→A at position 3 | chr1:3-3 |
chr1 2 3 |
Position 3 |
| Deletion | Delete positions 2–3 | chr1:2-3 |
chr1 1 3 |
Remove TG |
| Insertion | Insert T after position 3 | chr1:3-3 |
chr1 2 3 |
Insert after G |
Key points
- In 1-based, the end coordinate is included.
- In 0-based, the end coordinate is excluded.
- A single-base interval
N-N(1-based) corresponds to(N-1, N)in 0-based. - Both systems describe the same biological locations using different counting conventions.
Conversion Rules
Always Check Format Documentation
Always check the format description first, as some formats may not completely follow these conversion rules. Different tools and file formats may have specific conventions or exceptions that deviate from the general rules shown below.
| Scenario | 1-based Representation | 0-based Representation | Conversion Rule | Example (1-based → 0-based) | Example (0-based → 1-based) |
|---|---|---|---|---|---|
| Single base | N |
N − 1 |
0-based = 1-based − 1; 1-based = 0-based + 1 |
N=5 → 4 |
4 → 5 |
| Single-base interval | [N, N] |
[N−1, N) |
shift start only | [5,5] → [4,5) |
[4,5) → [5,5] |
| Interval | [start, end] |
[start−1, end) |
start −1, end same | [2,4] → [1,4) |
[1,4) → [2,4] |
| BED one base | chr1:N-N |
chr1 (N−1) N |
shift start only | chr1:101-101 → chr1 100 101 |
chr1 100 101 → chr1:101-101 |
| Deletion (length L) | deletes POS..POS+L−1 |
[POS−1, POS+L−1) |
start=POS−1, end=POS+L−1 |
POS=2,L=2 deletes 2–3 → [1,3) |
[1,3) → deletes 2–3 (POS=2,L=2) |
| Insertion (between bases) | between POS and POS+1 |
[POS, POS) |
start same; end ± 1 |
POS=3 → [3,3) |
[3,3) → POS=3-4 |
Notes:
- For most intervals, only the start changes; insertions are the exception (start stays the same, end shifts).
- 1-based intervals are inclusive; 0-based intervals are half-open.
- For insertions, BED cannot represent inserted sequence; it can only mark the insertion point.
References
- UCSC File Format FAQ - BED Format - Detailed description of BED format and coordinate systems
- Biostars: Understanding 0-based vs 1-based coordinates - Community discussion on coordinate systems
- UCSC Genome Browser Blog: Coordinate Counting Systems - Explanation of UCSC's coordinate counting systems
- Plastid Documentation: Coordinate Systems - Comprehensive guide to coordinate systems used in genomics