Motif composition, conservation and condition-specificity of single and alternative transcription start sites in the Drosophila genome
1 Program in Computational Biology and Bioinformatics, Duke University, Science Drive, Durham, NC 27708, USA
2 Institute for Genome Sciences and Policy, Duke University, Science Drive, Durham, NC 27708, USA
3 Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstrasse, Dresden 01307, Germany
4 Department of Biostatistics and Bioinformatics, Duke University, Duke University School of Medicine, Erwin Road, Durham NC 27710, USA
5 Department of Computer Science, Duke University, Durham, NC 27708, USA
Genome Biology 2009, 10:R73 doi:10.1186/gb-2009-10-7-r73Published: 9 July 2009
Transcription initiation is a key component in the regulation of gene expression. mRNA 5' full-length sequencing techniques have enhanced our understanding of mammalian transcription start sites (TSSs), revealing different initiation patterns on a genomic scale.
To identify TSSs in Drosophila melanogaster, we applied a hierarchical clustering strategy on available 5' expressed sequence tags (ESTs) and identified a high quality set of 5,665 TSSs for approximately 4,000 genes. We distinguished two initiation patterns: 'peaked' TSSs, and 'broad' TSS cluster groups. Peaked promoters were found to contain location-specific sequence elements; conversely, broad promoters were associated with non-location-specific elements. In alignments across other Drosophila genomes, conservation levels of sequence elements exceeded 90% within the melanogaster subgroup, but dropped considerably for distal species. Elements in broad promoters had lower levels of conservation than those in peaked promoters. When characterizing the distributions of ESTs, 64% of TSSs showed distinct associations to one out of eight different spatiotemporal conditions. Available whole-genome tiling array time series data revealed different temporal patterns of embryonic activity across the majority of genes with distinct alternative promoters. Many genes with maternally inherited transcripts were found to have alternative promoters utilized later in development. Core promoters of maternally inherited transcripts showed differences in motif composition compared to zygotically active promoters.
Our study provides a comprehensive map of Drosophila TSSs and the conditions under which they are utilized. Distinct differences in motif associations with initiation pattern and spatiotemporal utilization illustrate the complex regulatory code of transcription initiation.