An integrated computational pipeline and database to support whole-genome sequence annotation
1 Howard Hughes Medical Institute, University of California, Berkeley, CA 94720, USA
2 Department of Molecular and Cellular Biology, Life Sciences Addition, University of California, Berkeley, CA 94720-3200, USA
3 FlyBase-Berkeley, University of California, Berkeley, CA 94720-3200, USA
4 Genome Sciences Department, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA 94720, USA
Genome Biology 2002, 3:research0081-0081.11 doi:10.1186/gb-2002-3-12-research0081
This article is part of a series of refereed research articles from Berkeley Drosophila Genome Project, FlyBase and colleagues, describing Release 3 of the Drosophila genome, which are freely available at http://genomebiology.com/drosophila/.Published: 23 December 2002
We describe here our experience in annotating the Drosophila melanogaster genome sequence, in the course of which we developed several new open-source software tools and a database schema to support large-scale genome annotation. We have developed these into an integrated and reusable software system for whole-genome annotation. The key contributions to overall annotation quality are the marshalling of high-quality sequences for alignments and the design of a system with an adaptable and expandable flexible architecture.