Most of us are acutely aware of the limitations of current in silico methods for predicting genes in the human genome. In the May issue of Nature Biotechnology, Saurabh Saha and colleagues at the Johns Hopkins Medical Institutions describe an experimental approach for gene discovery and genome annotation (Nature Biotechnology 2002, 19:508-512). The method is an adaptation of the SAGE (serial analysis of gene expression) technology developed in the Vogelstein/Kinzler lab at Johns Hopkins. Compared to SAGE, the new 'LongSAGE' method uses a different type IIS restriction endonuclease (called MmeI) to create longer 21 base-pair 'tags', and the longer tag length should allow unique assignment to genomic loci. Saha et al. analysed 28,000 transcript tags expressed by a colon cancer cell line and found that the majority could be uniquely assigned and many of the remaining corresponded to duplicated sequences. They provide experimental evidence for the expression of 'hypothetical' genes - genes that have to date been predicted solely by in silico methods. Hundreds of the tags mapped far from known genes, and these may represent undiscovered transcripts. Mining of databases of expressed sequence tags (ESTs) confirmed that several of the LongSAGE tags correspond to uncharacterized genes. The authors suggest that large-scale LongSAGE analysis will provide a rich source of information for future gene-discovery and genome-annotation efforts.