Open Access Highly Accessed Open Badges Method

TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions

Daehwan Kim123*, Geo Pertea3, Cole Trapnell56, Harold Pimentel7, Ryan Kelley8 and Steven L Salzberg34

Author Affiliations

1 Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA

2 Department of Computer Science, University of Maryland, College Park, MD 20742, USA

3 Center for Computational Biology, McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, 733 N. Broadway, Baltimore, MD, 21205, USA

4 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD, 21205, USA

5 Broad Institute of MIT and Harvard, 7 Cambridge Center, Cambridge, MA, 02142, USA

6 Department of Stem Cell and Regenerative Biology, Harvard University, 7 Divinity Ave., Cambridge, MA, 02142, USA

7 Department of Electrical Engineering and Computer Science, University of California, 101 Sproul Hall, Berkeley, CA, 94720, USA

8 Illumina Inc., 5200 Illumina Way, San Diego, CA, 92122, USA

For all author emails, please log on.

Genome Biology 2013, 14:R36  doi:10.1186/gb-2013-14-4-r36

Published: 25 April 2013


TopHat is a popular spliced aligner for RNA-sequence (RNA-seq) experiments. In this paper, we describe TopHat2, which incorporates many significant enhancements to TopHat. TopHat2 can align reads of various lengths produced by the latest sequencing technologies, while allowing for variable-length indels with respect to the reference genome. In addition to de novo spliced alignment, TopHat2 can align reads across fusion breaks, which can occur after genomic translocations. TopHat2 combines the ability to identify novel splice sites with direct mapping to known transcripts, producing sensitive and accurate alignments, even for highly repetitive genomes or in the presence of pseudogenes. TopHat2 is available at webcite.