Table 1

Hierarchy of assembly data types

Data type


Scaffold (100 kb to 10 Mb)

Layout of potentially nonoverlapping contigs based on mate-pair information, ideally spanning entire chromosomes or replicons

Contig (5 kb to 500 kb)

Layout of overlapping reads with a consensus sequence

Mate-pair (2 kb to 100 kb)

Pair of end-sequenced reads with a known orientation and separation

Read (0.5 kb to 1.0 kb)

Base-calls and quality scores assigned to a chromatogram

Chromatogram (4× 10,000 time points)

Signal data from a sequencing reaction of a physical piece of DNA

Each type is composed of the next lower level type. Typical sizes are also listed. bp, base pairs; Mb, megabases.

Schatz et al. Genome Biology 2007 8:R34   doi:10.1186/gb-2007-8-3-r34

Open Data