De novo assembly of the planarian head regeneration transcriptome. (a) Schematic overview of the assembly strategies, using only 2 × 36-bp paired-end Illumina reads (blue), only 454 reads (red), or an assisted assembly of Illumina reads using transcripts previously assembled from 454 data as scaffolds (purple). Quality metrics shown include longest sequences in each assembly and the length N50, for which 50% of all bases are contained in transcripts at least as long as N50. (b) Kernel densities of the length distributions for sequences assembled only from Illumina data (blue), 454 data (red), Illumina data and 454 isotig scaffolds (purple), or for computationally predicted transcripts by MAKER (green). For multi-isoform loci, only the longest isoform was considered. (c) Kernel densities of ortholog hit ratios obtained by comparing sequences from the different assemblies or computational prediction to the Schistosoma mansoni proteome using blastx. For multi-isoform loci, only the longest isoform was considered. Colors as in (b). (d) Coverage of the 125 complete cDNA sequences from S. mediterranea available from GenBank by the best reciprocal blat hit from each dataset. For multi-isoform loci, only the longest isoform was considered. The boxplot indicates the 75th, 50th (median) and 25th percentile of cDNA coverage. In addition, individual points show the full coverage distribution for all reciprocal best hits (454, n = 77; Illumina, n = 86; Illumina+, n = 75; MAKER, n = 60). (e) Fraction of sequences from the different assemblies that could be aligned over 90% or 60% of their total length to a single genomic supercontig using blat. Colors as in (b).
Sandmann et al. Genome Biology 2011 12:R76 doi:10.1186/gb-2011-12-8-r76