This article is part of the supplement: The BioCreative II - Critical Assessment for Information Extraction in Biology Challenge

Open Access Open Badges Research

Overview of BioCreative II gene mention recognition

Larry Smith1*, Lorraine K Tanabe1*, Rie Johnson nee Ando2, Cheng-Ju Kuo3, I-Fang Chung3, Chun-Nan Hsu4, Yu-Shi Lin4, Roman Klinger5, Christoph M Friedrich5, Kuzman Ganchev6, Manabu Torii7, Hongfang Liu7, Barry Haddow8, Craig A Struble9, Richard J Povinelli10, Andreas Vlachos11, William A Baumgartner12, Lawrence Hunter12, Bob Carpenter13, Richard Tzong-Han Tsai1415, Hong-Jie Dai1416, Feng Liu17, Yifei Chen17, Chengjie Sun18, Sophia Katrenko19, Pieter Adriaans19, Christian Blaschke20, Rafael Torres20, Mariana Neves21, Preslav Nakov2223, Anna Divoli24, Manuel Maña-López25, Jacinto Mata25 and W John Wilbur1*

Author affiliations

1 National Center for Biotechnology Information, Bethesda, Maryland, USA

2 IBM TJ Watson Research Center, Yorktown Heights, NY, USA

3 Institute of Bioinformatics, National Yang-Ming University, Taipei, Taiwan

4 Institute of Information Science, Academia Sinica, Taipei, Taiwan

5 Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Department of Bioinformatics, Schloss Birlinghoven, Sankt Augustin, Germany

6 Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA

7 Department of Biostatistics, Bioinformatics and Biomathematics, Georgetown University Medical Center, Washington, District of Columbia, USA

8 School of Informatics, University of Edinburgh, UK

9 Department of Mathematics, Statistics and Computer Science, Marquette University, Milwaukee, Wisconsin, USA

10 Department of Electrical and Computer Engineering, Marquette University, Milwaukee, Wisconsin, USA

11 Computer Laboratory, University of Cambridge, Cambridge, UK

12 University of Colorado School of Medicine, Center for Computational Pharmacology, Denver, Colorado, USA

13 Alias-i, Inc., Brooklyn, New York, USA

14 Institute of Information Science, Academia Sinica, Taipei, Taiwan

15 Department of Computer Science & Engineering, Yuan Ze University, Taoyuan City, Taiwan

16 Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan

17 Computational Modeling Laboratory, Vrije Universiteit Brussels, Belgium

18 School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

19 Human Computer Studies Laboratory, Institute of Informatics, University of Amsterdam, Amsterdam, The Netherlands

20 Bioalma, Tres Cantos (Madrid), Spain

21 Facultad de Informática, Universidad Complutense de Madrid, Madrid, Spain

22 Department of Electrical Engineering and Computer Sciences, Computer Science Division, University of California, Berkeley, California, USA

23 Bulgarian Academy of Sciences, Institute for Parallel Processing, Linguistic Modeling Department, Sofia, Bulgaria

24 School of Information, University of California, Berkeley, California, USA

25 Departamento de Tecnologías de la Información, Universidad de Huelva, Huelva, Spain

For all author emails, please log on.

Citation and License

Genome Biology 2008, 9(Suppl 2):S2  doi:10.1186/gb-2008-9-s2-s2

Published: 1 September 2008


Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions.