Open Access Highly Accessed Open Badges Research

A draft annotation and overview of the human genome

Fred A Wright1, William J Lemon1, Wei D Zhao1, Russell Sears1, Degen Zhuo1, Jian-Ping Wang1, Hee-Yung Yang2, Troy Baer3, Don Stredney34, Joe Spitzner2, Al Stutz34, Ralf Krahe1 and Bo Yuan1*

Author Affiliations

1 Division of Human Cancer Genetics, The Ohio State University, 420 W. 12th Avenue, Columbus, OH 43210, USA

2, Busch Boulevard, Columbus, OH 43229, USA

3 Ohio Supercomputer Center (OSC), Kinnear Road, Columbus, OH 43212, USA

4 Department of Computer and Information Science, The Ohio State University, Neil Avenue, Columbus, OH 43210, USA

For all author emails, please log on.

Genome Biology 2001, 2:research0025-research0025.18  doi:10.1186/gb-2001-2-7-research0025

A previous version of this manuscript was made available before peer review at

Published: 4 July 2001



The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena.


We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome.


We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence.