Highly Accessed Open Badges Review

Identifying protein-coding genes in genomic sequences

Jennifer Harrow1, Alinda Nagy2, Alexandre Reymond3, Tyler Alioto4, Laszlo Patthy2, Stylianos E Antonarakis5 and Roderic Guigó4*

Author affiliations

1 Wellcome Trust Sanger Institute, Wellcome Trust Campus, Hinxton, Cambridge CB10 1SA, UK

2 Institute of Enzymology, Biological Research Center, Hungarian Academy of Sciences, H-1113 Budapest, Hungary

3 Center for Integrative Genomics, Genopode Building, University of Lausanne, CH-1015 Lausanne, Switzerland

4 Centre de Regulació Genòmica, Institut Municipal d'Investigació Mèdica, Universitat Pompeu Fabra, E-08003 Barcelona, Catalonia, Spain

5 Department of Genetic Medicine and Development, University of Geneva Medical School and University Hospitals of Geneva, Geneva 1211, Switzerland

For all author emails, please log on.

Citation and License

Genome Biology 2009, 10:201  doi:10.1186/gb-2009-10-1-201

Published: 30 January 2009


The vast majority of the biology of a newly sequenced genome is inferred from the set of encoded proteins. Predicting this set is therefore invariably the first step after the completion of the genome DNA sequence. Here we review the main computational pipelines used to generate the human reference protein-coding gene sets.