Molecular archeology of L1 insertions in the human genome
1 National Center for Biotechnology Information (NCBI), National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
2 Department of Molecular Biology and Genetics, Johns Hopkins University School of Medicine, 725 N Wolfe St, Baltimore, MD 21205, USA
3 Current addresses: Biogen, Inc., Cambridge, MA 02142, USA
4 Human Genome Sciences, Inc., Rockville, MD 20850, USA
5 Department of Biology, The Pennsylvania State University, 0208 Mueller Lab, University Park, PA 16802, USA
6 Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue, North Seattle, WA 98109, USA
7 These authors contributed equally to this work
Genome Biology 2002, 3:research0052-research0052.18 doi:10.1186/gb-2002-3-10-research0052Published: 19 September 2002
As the rough draft of the human genome sequence nears a finished product and other genome-sequencing projects accumulate sequence data exponentially, bioinformatics is emerging as an important tool for studies of transposon biology. In particular, L1 elements exhibit a variety of sequence structures after insertion into the human genome that are amenable to computational analysis. We carried out a detailed analysis of the anatomy and distribution of L1 elements in the human genome using a new computer program, TSDfinder, designed to identify transposon boundaries precisely.
Structural variants of L1 elements shared similar trends in the length and quality of their target site duplications (TSDs) and poly(A) tails. Furthermore, we found no correlation between the composition and genomic location of the pre-insertion locus and the resulting anatomy of the L1 insertion. We verified that L1 insertions with TSDs have the 5'-TTAAAA-3' cleavage site associated with L1 endonuclease activity. In addition, the second target DNA cut required for L1 insertion weakly matches the consensus pattern TTAAAA. On the other hand, the L1-internal breakpoints of deleted and inverted L1 elements do not resemble L1 endonuclease cleavage sites. Finally, the genome sequence data indicate that whereas singly inverted elements are common, doubly inverted elements are almost never found.
The sequence data give no indication that the creation of L1 structural variants depends on characteristics of the insertion locus. In addition, the formation of 5' truncated and 5' inverted L1s are probably not due to the action of the L1 endonuclease.