Exon creation and establishment in human genes
1 Computational Genomics, Universitat Pompeu Fabra, Dr. Aiguader 88, Barcelona, 08003, Spain
2 Graduate Program in Areas of Basic and Applied Biology, Universidade do Porto, Praça Gomes Teixeira, Porto, 4099-002, Portugal
3 Catalan Institution for Research and Advanced Studies, Passeig Lluís Companys 23, Barcelona, 08010, Spain
Genome Biology 2008, 9:R141 doi:10.1186/gb-2008-9-9-r141Published: 23 September 2008
A large proportion of species-specific exons are alternatively spliced. In primates, Alu elements play a crucial role in the process of exon creation but many new exons have appeared through other mechanisms. Despite many recent studies, it is still unclear which are the splicing regulatory requirements for de novo exonization and how splicing regulation changes throughout an exon's lifespan.
Using comparative genomics, we have defined sets of exons with different evolutionary ages. Younger exons have weaker splice-sites and lower absolute values for the relative abundance of putative splicing regulators between exonic and adjacent intronic regions, indicating a less consolidated splicing regulation. This relative abundance is shown to increase with exon age, leading to higher exon inclusion. We show that this local difference in the density of regulators might be of biological significance, as it outperforms other measures in real exon versus pseudo-exon classification. We apply this new measure to the specific case of the exonization of anti-sense Alu elements and show that they are characterized by a general lack of exonic splicing silencers.
Our results suggest that specific sequence environments are required for exonization and that these can change with time. We propose a model of exon creation and establishment in human genes, in which splicing decisions depend on the relative local abundance of regulatory motifs. Using this model, we provide further explanation as to why Alu elements serve as a major substrate for exon creation in primates. Finally, we discuss the benefits of integrating such information in gene prediction.