Open Access Highly Accessed Open Badges Research

Creating a honey bee consensus gene set

Christine G Elsik1*, Aaron J Mackey23, Justin T Reese1, Natalia V Milshina1, David S Roos2 and George M Weinstock4

  • * Corresponding author: Christine G Elsik

  • † Equal contributors

Author Affiliations

1 Department of Animal Science, Texas A&M University, TAMU, College Station, Texas 77843, USA

2 Penn Genomics Institute, University of Pennsylvania, S. University Avenue, Philadelphia, Pennsylvania 19104, USA

3 GlaxoSmithKline, S. Collegeville Road, Collegeville, Pennsylvania 19426, USA

4 Human Genome Sequencing Center, Baylor College of Medicine, Baylor Plaza, Houston, Texas 77030, USA

For all author emails, please log on.

Genome Biology 2007, 8:R13  doi:10.1186/gb-2007-8-1-r13

Published: 22 January 2007



We wished to produce a single reference gene set for honey bee (Apis mellifera). Our motivation was twofold. First, we wished to obtain an improved set of gene models with increased coverage of known genes, while maintaining gene model quality. Second, we wished to provide a single official gene list that the research community could further utilize for consistent and comparable analyses and functional annotation.


We created a consensus gene set for honey bee (Apis mellifera) using GLEAN, a new algorithm that uses latent class analysis to automatically combine disparate gene prediction evidence in the absence of known genes. The consensus gene models had increased representation of honey bee genes without sacrificing quality compared with any one of the input gene predictions. When compared with manually annotated gold standards, the consensus set of gene models was similar or superior in quality to each of the input sets.


Most eukaryotic genome projects produce multiple gene sets because of the variety of gene prediction programs. Each of the gene prediction programs has strengths and weaknesses, and so the multiplicity of gene sets offers users a more comprehensive collection of genes to use than is available from a single program. On the other hand, the availability of multiple gene sets is also a cause for uncertainty among users as regards which set they should use. GLEAN proved to be an effective method to combine gene lists into a single reference set.