Table 7

Information extracted from different data sources

Data source (version)

Information extracted (for each gene or locus)

Number of genes



Ensembl (Build 31)

Gene name, chromosome or contig, start and end positions, strand (transcription direction), exons, gene-product (including function name(s) or description(s), synonyms and EC number(s)), cross references (IDs) to other databases (SwissProt, HUGO, PDB, GO, RefSeq, OMIM, Entrez, SPTREMBL, EMBL, LocusLink).


LocusLink (03/29/2003)

Gene name, chromosome, gene product (function name or description), function synonyms, EC number(s), gene and protein comments, cross references (IDs) to other databases (Entrez, UCSC Genome, RefSeq, GO, OMIM, UniGene, PubMed)



GenBank NC_001807 (mitochondrion)

Gene name, start and end positions, transcription direction, gene product (function name or description)


Functional information in Ensembl had to be extensively parsed to extract multiple functions, EC numbers, and/or synonyms. The 'nonredundant' column shows the number of genes from LocusLink that had no corresponding gene in the other two data sources (Ensembl and GenBank).

Romero et al. Genome Biology 2004 6:R2   doi:10.1186/gb-2004-6-1-r2

Open Data