The genomic landscape shaped by selection on transposable elements across 18 mouse strains
- Equal contributors
1 MRC Functional Genomics Unit, Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford, OX1 3PT, UK
2 The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1HH, UK
3 Wellcome Trust Centre for Human Genetics, University of Oxford, Roosevelt Drive, Oxford, OX3 7BN, UK
4 University of California, Los Angeles, California, 90095, USA
5 National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
6 The Jackson Laboratory, Bar Harbor, Maine 04609, USA
Genome Biology 2012, 13:R45 doi:10.1186/gb-2012-13-6-r45Published: 15 June 2012
Transposable element (TE)-derived sequence dominates the landscape of mammalian genomes and can modulate gene function by dysregulating transcription and translation. Our current knowledge of TEs in laboratory mouse strains is limited primarily to those present in the C57BL/6J reference genome, with most mouse TEs being drawn from three distinct classes, namely short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs) and the endogenous retrovirus (ERV) superfamily. Despite their high prevalence, the different genomic and gene properties controlling whether TEs are preferentially purged from, or are retained by, genetic drift or positive selection in mammalian genomes remain poorly defined.
Using whole genome sequencing data from 13 classical laboratory and 4 wild-derived mouse inbred strains, we developed a comprehensive catalogue of 103,798 polymorphic TE variants. We employ this extensive data set to characterize TE variants across the Mus lineage, and to infer neutral and selective processes that have acted over 2 million years. Our results indicate that the majority of TE variants are introduced though the male germline and that only a minority of TE variants exert detectable changes in gene expression. However, among genes with differential expression across the strains there are twice as many TE variants identified as being putative causal variants as expected.
Most TE variants that cause gene expression changes appear to be purged rapidly by purifying selection. Our findings demonstrate that past TE insertions have often been highly deleterious, and help to prioritize TE variants according to their likely contribution to gene expression or phenotype variation.