Identifying related L1 retrotransposons by analyzing 3' transduced sequences
1 National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
2 Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N. Wolfe St., Baltimore, MD 21205, USA
3 Current address: Biogen Inc, Cambridge, MA 02142, USA
4 Current address: Human Genome Sciences Inc., Rockville, MD 20850, USA
Genome Biology 2003, 4:R30 doi:10.1186/gb-2003-4-5-r30Published: 16 April 2003
A large fraction of the human genome is attributable to L1 retrotransposon sequences. Not only do L1s themselves make up a significant portion of the genome, but L1-encoded proteins are thought to be responsible for the transposition of other repetitive elements and processed pseudogenes. In addition, L1s can mobilize non-L1, 3'-flanking DNA in a process called 3' transduction. Using computational methods, we collected DNA sequences from the human genome for which we have high confidence of their mobilization through L1-mediated 3' transduction.
The precursors of L1s with transduced sequence can often be identified, allowing us to reconstruct L1 element families in which a single parent L1 element begot many progeny L1s. Of the L1s exhibiting a sequence structure consistent with 3' transduction (L1 with transduction-derived sequence, L1-TD), the vast majority were located in duplicated regions of the genome and thus did not necessarily represent unique insertion events. Of the remaining L1-TDs, some lack a clear polyadenylation signal, but the alignment between the parent-progeny sequences nevertheless ends in an A-rich tract of DNA.
Sequence data suggest that during the integration into the genome of RNA representing an L1-TD, reverse transcription may be primed internally at A-rich sequences that lie downstream of the L1 3' untranslated region. The occurrence of L1-mediated transduction in the human genome may be less frequent than previously thought, and an accurate estimate is confounded by the frequent occurrence of segmental genomic duplications.