XREFdb is part of an attempt to identify connections between the genetics of model organisms and mammalian phenotypes. TBLASTN is used to search for significant matches between protein sequences from model organisms and mammalian peptide sequences predicted by the translation of expressed sequence tags (ESTs).
The database was originally set up to allow searches with named peptides from the fully sequenced genome of the yeast Saccharomyces cerevisiae against the human EST database. If you select the option 'Cross-referencing Yeast and Human Genes' from the XREFdb home page, you can either search using the name of a yeast open reading frame (ORF) or download a table of all matches between yeast and human genes. If you select XREF2 from the XREFdb homepage, you can then supply a gene name from a large selection of model organisms (specify which one by choosing from a list) that will be used to identify ESTs. If you submit a gene name that does not exist in that organism, there is an option of pasting a protein sequence into a box, and this is used to search the database.
If you set up an XREF account, the searches between your query sequences and the evolving EST databases are updated on a quarterly basis. There is, however, no indication of how often the search between the complete yeast sequence and the human EST dataset is carried out. The current data for this option are from searches begun on 12 March 1999, which makes the results very out of date.
A third option at the XREFdb homepage is to establish an XREFdb account. With this option, query sequence(s) are checked against human, mouse and rat ESTs on a quarterly basis and a report e-mailed to the user. Extra features with this option include EST mapping information (where available) and suggested cross-references between your gene(s) of interest and mammalian phenotypes, which are generated from studying the map positions of the ESTs.
The option of carrying out a search with a protein sequence, rather than a gene name, only becomes apparent if you happen to submit a gene name that does not exist. This search option should be offered from the outset. The different databases interrogated by the various search options are not initially obvious. For example, the original yeast/human cross-reference set presumably does not include mouse and rat ESTs. If you use XREF2, the databases scanned are not stated, whereas if you set up an XREF account, then mouse and rat, as well as human, EST databases are searched. The data in the yeast/human cross-reference set appear out of date.
A direct link to any UniGene cluster for EST(s) identified by the search would be useful. It would be good to have the option of having your XREF search carried out more frequently, or at a particular point in time. The ability to alter the TBLASTN parameters used for your search might also be useful; currently the default parameters are used in all cases.
Saccharomyces genome database (SGD) stores molecular biology and genetics information about S. cerevisiae. Other useful sites are dbEST: Database of expressed sequence tags, UniGene, the gene-clustering system, and A gene map of the human genome.