A draft annotation and overview of the human genome
1 Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA
2 LabBook.com, Busch Boulevard, Columbus, OH 43229, USA
3 Ohio Supercomputer Center (OSC), Kinnear Road, Columbus, OH 43212, USA
4 Department of Computer and Information Science, The Ohio State University, Neil Avenue, Columbus, OH 43210, USA
Genome Biology 2001, 2:research0025-research0025.18 doi:10.1186/gb-2001-2-7-research0025
A previous version of this manuscript was made available before peer review at http://genomebiology.com/2001/2/3/preprint/0001/Published: 4 July 2001
The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena.
We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome.
We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence.