<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>gb-2005-6-3-r23</ui>
	<ji>GBJ</ji>
	<fm>
		<dochead>Research</dochead>
		<bibl>
			<title>
				<p>Serendipitous discovery of <it>Wolbachia </it>genomes in multiple <it>Drosophila </it>species</p>
			</title>
			<aug>
				<au id="A1" ca="yes">
					<snm>Salzberg</snm>
					<mi>L</mi>
					<fnm>Steven</fnm>
					<insr iid="I1"/>
					<email>salzberg@tigr.org</email>
				</au>
				<au id="A2">
					<snm>Hotopp</snm>
					<mnm>C Dunning</mnm>
					<fnm>Julie</fnm>
					<insr iid="I1"/>
					<email>jdunning@tigr.org</email>
				</au>
				<au id="A3">
					<snm>Delcher</snm>
					<mi>L</mi>
					<fnm>Arthur</fnm>
					<insr iid="I1"/>
					<email>adelcher@tigr.org</email>
				</au>
				<au id="A4">
					<snm>Pop</snm>
					<fnm>Mihai</fnm>
					<insr iid="I1"/>
					<email>mpop@tigr.org</email>
				</au>
				<au id="A5">
					<snm>Smith</snm>
					<mi>R</mi>
					<fnm>Douglas</fnm>
					<insr iid="I2"/>
					<email>dsmith@agencourt.com</email>
				</au>
				<au id="A6">
					<snm>Eisen</snm>
					<mi>B</mi>
					<fnm>Michael</fnm>
					<insr iid="I3"/>
					<email>mbeisen@lbl.gov</email>
				</au>
				<au id="A7">
					<snm>Nelson</snm>
					<mi>C</mi>
					<fnm>William</fnm>
					<insr iid="I1"/>
					<email>wnelson@tigr.org</email>
				</au>
			</aug>
			<insg>
				<ins id="I1">
					<p>The Institute for Genomic Research, 9712 Medical Center Drive, Rockville, MD 20850, USA</p>
				</ins>
				<ins id="I2">
					<p>Agencourt Bioscience Corporation, 100 Cumming Center, Beverley, MA 01915, USA</p>
				</ins>
				<ins id="I3">
					<p>Center for Integrative Genomics, University of California, Berkeley, CA 94720, USA</p>
				</ins>
			</insg>
			<source>Genome Biology</source>
			<issn>1465-6906</issn>
			<pubdate>2005</pubdate>
			<volume>6</volume>
			<issue>3</issue>
			<fpage>R23</fpage>
			<url>http://genomebiology.com/2005/6/3/R23</url>
			<xrefbib>
				<pubidlist><pubid idtype="pmpid">15774024</pubid><pubid idtype="doi">10.1186/gb-2005-6-3-r23</pubid>
				</pubidlist></xrefbib>
		</bibl>
		<history>
			<rec>
				<date>
					<day>22</day>
					<month>12</month>
					<year>2004</year>
				</date>
			</rec>
			<revrec>
				<date>
					<day>24</day>
					<month>1</month>
					<year>2005</year>
				</date>
			</revrec>
			<acc>
				<date>
					<day>24</day>
					<month>1</month>
					<year>2005</year>
				</date>
			</acc>
			<pub>
				<date>
					<day>22</day>
					<month>2</month>
					<year>2005</year>
				</date>
			</pub>
		</history>
		<cpyrt>
			<year>2005</year>
			<collab>Salzberg et al.; licensee BioMed Central Ltd.</collab>
			<note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
		</cpyrt>
		<shorttitle>
			<p><it>Wolbachia</it> genomes in <it>Drosophila</it> sequences</p>
		</shorttitle>
		<shortabs>
			<p>By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont <it>Wolbachia pipientis</it> in three different species of fruit fly: <it>Drosophila ananassae</it>, <it>D. simulans</it>, and <it>D. mojavensis</it>.</p>
		</shortabs>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st>
					<p>The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st>
					<p>By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont <it>Wolbachia pipientis </it>in three different species of fruit fly: <it>Drosophila ananassae</it>, <it>D. simulans</it>, and <it>D. mojavensis</it>. We extracted all sequences with partial matches to a previously sequenced <it>Wolbachia </it>strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st>
					<p>The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new <it>Wolbachia </it>genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<meta>
		<classifications>
			<classification type="BMC" subtype="man_spc_id" id="30010002">Bioinformatics</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010014">Microbiology and parasitology</classification>
			<classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
		</classifications>
	</meta>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st>
			<p>Large-scale sequencing projects continue to generate a growing number of new genomes from an ever-wider range of species. A rarely noted and unappreciated side effect of some projects occurs when the organism being sequenced contains an intracellular endosymbiont. In some cases, the existence of the endosymbiont is unknown to both the sequencing center and the laboratory providing the source DNA. Fortunately, many genome projects deposit all their raw sequence data into a publicly available, unrestricted repository known as the Trace Archive <abbrgrp><abbr bid="B1">1</abbr></abbrgrp>. By conducting large-scale searches of the Trace Archive, one can discover the presence of these endosymbionts and, with the aid of bioinformatics tools including genome assembly algorithms, reconstruct some or most of the endosymbiont genomes.</p>
			<p>The amount of endosymbiont DNA present in a genome deposited in the Trace Archive depends on several factors: the number of sequences generated by the project, the size of the host genome, the size of the endosymbiont genome, and the number of copies of the endosymbiont present in each cell of the host. Because the copy number varies among cell types, the amount of endosymbiont DNA also depends on the preparation method used to extract host DNA; for example, the use of eggs or early-stage embryos will yield much greater amounts of <it>Wolbachia </it>from its hosts, because the bacterium occurs in much higher copy numbers in egg cells than in other cell types <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>. If the host genome is 200 million base-pairs (Mbp) in length, and the endosymbiont is 1 Mbp, and if there is one endosymbiont per host cell, then 0.5% of the sequences from a random sequencing project of the host will derive from the endosymbiont. The critical factor is the copy number per cell: regardless of genome size, if there is one endosymbiont genome per cell, then the endosymbiont will be sequenced to the same depth of coverage as the host, and the genome assembly will, in theory, cover both genomes to the same extent.</p>
			<p>The search for these hidden genomes is aided greatly by the availability of a complete genome of a related species. Fortunately, the complete genome of <it>Wolbachia pipientis w</it>Mel, an endosymbiont of <it>D. melanogaster </it><abbrgrp><abbr bid="B3">3</abbr></abbrgrp>, is available to aid the search. <it>Wolbachia </it>species are common obligate intracellular parasites that infect a wide variety of invertebrates, including not only fruit flies but also mosquitoes, arthropods and nematodes <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr></abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Results and discussion</p>
			</st>
			<p>Using the 1,267,782 bp <it>w</it>Mel genome as a probe, we searched the Trace Archive entries of seven recently sequenced <it>Drosophila </it>species, each of which was sequenced to approximately eightfold coverage. For three of these species, we found clear evidence of <it>Wolbachia </it>infections in the host.</p>
			<p>From the 2,772,509 traces of <it>Drosophila ananassae </it><abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, we retrieved 32,720 sequences that either matched the <it>w</it>Mel strain or were paired with sequences that matched <it>w</it>Mel (see Materials and methods). Our assembly of these sequences yielded a new genome, <it>Wolbachia w</it>Ana, containing 1,440,650 bp in 329 separate scaffolds, at approximately eightfold coverage. At this coverage depth, we estimate that 98% of the <it>w</it>Ana genome is included in the assembly. The alignment of the <it>w</it>Ana scaffolds to <it>w</it>Mel covers approximately 878 kbp (70%) of the 1.27 Mb <it>w</it>Mel genome. A mapping of all the individual <it>w</it>Ana reads to <it>w</it>Mel gives greater coverage - 1.11 Mbp (87%) of the <it>w</it>Mel genome.</p>
			<p>From the 2,214,248 traces of <it>D. simulans </it><abbrgrp><abbr bid="B7">7</abbr></abbrgrp>, we retrieved and assembled 3,727 sequences. The resulting genome fragments of <it>Wolbachia w</it>Sim cover 896,761 bp of <it>w</it>Sim at twofold coverage, which we estimate to cover 65-80% of <it>w</it>Sim. The comparative assembly (see Materials and methods) resulted in 388 contigs plus 241 singleton sequences, and a separate scaffolding program further grouped 273 of these contigs into 84 scaffolds. The alignment between <it>w</it>Sim and <it>w</it>Mel covers 861 kbp (65%) of the <it>w</it>Mel genome.</p>
			<p>From the 2,445,065 traces of <it>D. mojavensis </it><abbrgrp><abbr bid="B6">6</abbr></abbrgrp>, we retrieved 101 sequences matching <it>w</it>Mel, plus another 13 sequences that did not match <it>w</it>Mel but were paired with the matching sequences. The sample is too small for assembly, but even so it represents approximately 87 kb (6-7%) of the <it>Wolbachia w</it>Moj genome.</p>
			<p>No <it>Wolbachia </it>sequences were found in the other Drosophila species currently available: <it>D. pseudoobscura</it>, <it>D. yakuba</it>, <it>D. virilis </it>and <it>D. melanogaster</it>.</p>
			<p><it>Wolbachia </it>has previously been described to infect multiple strains of <it>D. simulans</it>, and a fragment of the 16S ribosomal RNA gene has been sequenced (GenBank ID AF312372) <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. It has also been described in <it>D. ananassae </it><abbrgrp><abbr bid="B9">9</abbr></abbrgrp>, but has not been previously reported in <it>D. mojavensis </it>(and no sequences can be found in the <it>Wolbachia </it>database maintained at <abbrgrp><abbr bid="B10">10</abbr></abbrgrp>).</p>
			<sec>
				<st>
					<p>Genome organization</p>
				</st>
				<p>Comparison of the <it>w</it>Ana and <it>w</it>Mel species indicates extensive rearrangements between the genomes. This is best illustrated with the longest scaffold in <it>w</it>Ana, which contains 455,845 bp, approximately one-third of the genome. Figure <figr fid="F1">1</figr> shows a map of this scaffold compared to the <it>w</it>Mel genome. The scaffold spans more than a dozen rearrangements that have occurred since the divergence of these species. We also found evidence of rearrangements within our <it>w</it>Ana sequences (see Materials and methods), indicating that the <it>D. ananassae </it>strain may have been infected with two or more divergent <it>Wolbachia </it>strains. The rearrangements shown in Figure <figr fid="F1">1</figr> are typical of the interstrain alignments; breakpoints occur even among the very sparsely sampled <it>w</it>Moj sequences. Although only 101 sequences matched <it>w</it>Mel, seven of these spanned either insertions or large-scale rearrangements in the <it>w</it>Mel genome.</p>
				<fig id="F1">
					<title>
						<p>Figure 1</p>
					</title>
					<caption>
						<p>Alignment of complete <it>w</it>Mel genome (horizontal axis) to longest scaffold from the wAna genome assembly</p>
					</caption>
					<text>
						<p>Alignment of complete <it>w</it>Mel genome (horizontal axis) to longest scaffold from the wAna genome assembly. Red points indicate sequences aligned in the forward orientation, green points indicate reverse orientation. The diagonals represent colinear regions, and breaks in the diagonals correspond to inversions and translocations between the two genomes.</p>
					</text>
					<graphic file="gb-2005-6-3-r23-1"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Genome comparisons</p>
				</st>
				<p>In these assemblies, approximately 464, 92 and 6 genes were discovered in the <it>w</it>Ana, <it>w</it>Sim and <it>w</it>Moj genomes, respectively (see Additional data file 1), that were not found in the previously reported <it>W. pipientis w</it>Mel genome. Of these novel genes, 343 were conserved hypothetical proteins, 81 transposases, 13 phage-related proteins and seven ankyrin domain proteins. Of the remaining 118 genes, 34 are proteins from the <it>w</it>Ana assembly of insect origin, which are likely to represent <it>Drosophila </it>contaminants as a result of chimeric inserts in the original sequencing library. Another 51 predicted genes are shorter than 300 bp and may not constitute real genes. The remaining 33 genes have similarity to known genes and include genes that have tentatively been identified to be involved in transport, DNA binding or regulation, and a variety of other functions. Many of the unique genes have anomalous GC content, suggesting horizontal gene transfer (HGT), with 12 genes displaying a GC content greater than 50% as opposed to the typical 35% GC content found in these genomes and <it>w</it>Mel (Table <tblr tid="T1">1</tblr>).</p>
				<tbl id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>Summary statistics for assemblies of the three new <it>Wolbachia </it>genomes</p>
					</caption>
					<tblbdy cols="5">
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>wAna</p>
							</c>
							<c ca="left">
								<p>wSim</p>
							</c>
							<c ca="left">
								<p>wMoj</p>
							</c>
							<c ca="left">
								<p>wMel</p>
							</c>
						</r>
						<r>
							<c cspan="5">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Molecule length (bp)</p>
							</c>
							<c ca="left">
								<p>1,440,650</p>
							</c>
							<c ca="left">
								<p>896,761</p>
							</c>
							<c ca="left">
								<p>86,870</p>
							</c>
							<c ca="left">
								<p>1,267,782</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Scaffolds</p>
							</c>
							<c ca="left">
								<p>329</p>
							</c>
							<c ca="left">
								<p>84</p>
							</c>
							<c ca="left">
								<p>114</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Genes</p>
							</c>
							<c ca="left">
								<p>1837</p>
							</c>
							<c ca="left">
								<p>790</p>
							</c>
							<c ca="left">
								<p>63</p>
							</c>
							<c ca="left">
								<p>1271</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Contigs</p>
							</c>
							<c ca="left">
								<p>464</p>
							</c>
							<c ca="left">
								<p>388</p>
							</c>
							<c ca="left">
								<p>114</p>
							</c>
							<c ca="left">
								<p>1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>GC content (%)</p>
							</c>
							<c ca="left">
								<p>35.4</p>
							</c>
							<c ca="left">
								<p>35.0</p>
							</c>
							<c ca="left">
								<p>34.5</p>
							</c>
							<c ca="left">
								<p>35.2</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p>Average gene length (bp)</p>
							</c>
							<c ca="left">
								<p>608</p>
							</c>
							<c ca="left">
								<p>916</p>
							</c>
							<c ca="left">
								<p>633</p>
							</c>
							<c ca="left">
								<p>855</p>
							</c>
						</r>
					</tblbdy>
					<tblfn>
						<p>The wSim genome was assembled using the comparative assembler, AMOS-Cmp, and scaffolded using Bambus. The wAna genome was assembled using the Celera Assembler, as described in Materials and methods. Note that the high gene count for wAna is likely due to fragmentation of individual genes across separate contigs.</p>
					</tblfn>
				</tbl>
				<p>Consistent with the observation that novel genes in the new <it>Wolbachia </it>strains tend to be hypothetical proteins, genes present in <it>w</it>Mel that are absent in the <it>w</it>Ana assembly are also predominantly hypothetical proteins. Of the 347 <it>w</it>Mel genes not found in <it>w</it>Ana, 207 were hypothetical proteins, with the next highest category being mobile elements and extrachromosomal elements, with 37 genes. This suggests that as much as 27% of the predicted genes in <it>w</it>Mel could be highly variable.</p>
				<p>Two large gene clusters in <it>W. pipientis w</it>Mel were not identified in the <it>w</it>Sim and <it>w</it>Ana assemblies (Figure <figr fid="F2">2</figr>). This could suggest absence or divergence of these regions. The lack of the recovery of two of the regions (A and B) is interesting as both regions contain genes that have been suggested to affect host-endosymbiont interactions <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
				<fig id="F2">
					<title>
						<p>Figure 2</p>
					</title>
					<caption>
						<p>Circular map comparing the <it>w</it>Mel genome with the <it>w</it>Ana, <it>w</it>Sim and <it>w</it>Moj assemblies</p>
					</caption>
					<text>
						<p>Circular map comparing the <it>w</it>Mel genome with the <it>w</it>Ana, <it>w</it>Sim and <it>w</it>Moj assemblies. Ring 1 (outermost ring): forward strand genes; ring 2: reverse strand genes; ring 3: GC-skew plot; ring 4: X<sup>2 </sup>analysis of trinucleotide composition, with peaks indicating atypical regions; ring 5: <it>w</it>Mel genes present in wAna assembly; ring 6: <it>w</it>Mel genes present in the wSim assembly; ring 7: <it>w</it>Mel genes present in <it>w</it>Moj assembly. Large regions on the <it>w</it>Mel genome that were not recovered in the <it>w</it>Ana or <it>w</it>Sim assemblies are marked on the outside (regions A, B).</p>
					</text>
					<graphic file="gb-2005-6-3-r23-2"/>
				</fig>
				<p>Region A includes the 3'-region of the WO-A phage and the region directly downstream. It includes the interval containing genes WD0289-WD0296, which encodes four hypothetical proteins - three ankyrin repeat domain proteins and a conserved hypothetical protein. The absence of WD0289-WD0292 is interesting because it may suggest some variation in the phage 3'-region. Although WD0289-WD00291 is unique to WO-A, a protein homologous to WD0292 has been found in the previously described <it>Wolbachia </it>phage <abbrgrp><abbr bid="B3">3</abbr><abbr bid="B11">11</abbr></abbrgrp>). Variation in the <it>Wolbachia </it>phage could facilitate the introduction of novel genes <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. As ankyrin repeat proteins, WD0291, WD0292, and WD0294 are all of interest as they have been proposed to be involved in host-interaction functions <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. This could provide a means by which the phage could cause different host-interaction phenotypes.</p>
				<p>Region B includes WD0509-WD0514, which encodes a DNA mismatch repair protein MutL-2, a degenerate ribonuclease, a conserved hypothetical protein, two hypothetical proteins and an ankyrin repeat domain protein. This region is of further interest since WD0511-WD0514 is found only in <it>W. pipientis w</it>Mel and not the related sequenced Anaplasmataceae, Rickettsiaceae or &#945;-Proteobacteria. In <it>W. pipientis w</it>Mel, this region is flanked on the 3'-end by an interrupted reverse transcriptase and an IS5 transposase, supporting the hypothesis that it was acquired horizontally. The absence of MutL-2 might not be functionally important since <it>w</it>Mel, <it>w</it>Ana, and <it>w</it>Sim all have a copy of MutL-1.</p>
			</sec>
			<sec>
				<st>
					<p>Evolutionary comparisons</p>
				</st>
				<p>We aligned all genomes to one another to find those sequences shared by all four strains. Because <it>W. pipientis w</it>Moj comprises the smallest sample, we used the 114 sequences from that strain as a query to search the other three strains, and found 90 sequences shared among all strains. We then created four-way multi-alignments for each of these 90 sequences (see Materials and methods). Excluding the large insertions and deletions discussed above, the strains are highly similar, as summarized in Table <tblr tid="T2">2</tblr>.</p>
				<tbl id="T2">
					<title>
						<p>Table 2</p>
					</title>
					<caption>
						<p>Percent identity between nucleotide sequences of the four sequenced strains of <it>Wolbachia</it></p>
					</caption>
					<tblbdy cols="4">
						<r>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p><it>w</it>Mel</p>
							</c>
							<c ca="left">
								<p><it>w</it>Ana</p>
							</c>
							<c ca="left">
								<p><it>w</it>Sim</p>
							</c>
						</r>
						<r>
							<c cspan="4">
								<hr/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>w</it>Mel</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>97.2</p>
							</c>
							<c ca="left">
								<p>97.1</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>w</it>Ana</p>
							</c>
							<c ca="left">
								<p>97.2</p>
							</c>
							<c>
								<p/>
							</c>
							<c ca="left">
								<p>99.8</p>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>w</it>Sim</p>
							</c>
							<c ca="left">
								<p>97.1</p>
							</c>
							<c ca="left">
								<p>99.8</p>
							</c>
							<c>
								<p/>
							</c>
						</r>
						<r>
							<c ca="left">
								<p><it>w</it>Moj</p>
							</c>
							<c ca="left">
								<p>94.9</p>
							</c>
							<c ca="left">
								<p>97.5</p>
							</c>
							<c ca="left">
								<p>97.3</p>
							</c>
						</r>
					</tblbdy>
				</tbl>
				<p>As the table shows, the two most closely related strains are <it>w</it>Ana and <it>w</it>Sim, which are nearly identical at the DNA level. Both <it>w</it>Mel and <it>w</it>Moj are approximately equidistant from these two strains, at just over 97% identity, but are more distant from one another. Note however that because the <it>w</it>Moj sequences are single reads (that is, single-pass sequencing), the error rate in these sequences is substantially higher than in the assembled genomes of the other strains, which in turn may make it appear that <it>w</it>Moj is more divergent.</p>
			</sec>
			<sec>
				<st>
					<p>Ankyrin repeat domain proteins</p>
				</st>
				<p>Ankyrin repeat proteins showed considerable variability among the four <it>Wolbachia </it>strains. It has been proposed that ankyrin repeat proteins may influence the host by regulating host cell cycle, regulating host cell division, and interacting with the host cytoskeleton <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>. These genes and their relationship to cell cycle, and therefore reproduction, are likely candidates for involvement in host interactions like cytoplasmic incompatibility, male killing, parthenogenesis and feminization.</p>
				<p>There were four ankyrin repeat proteins absent in <it>w</it>Ana and <it>w</it>Sim in the Regions A and B above. There were also seven new ankyrin repeat proteins identified in <it>w</it>Ana, <it>w</it>Sim, and <it>w</it>Moj. In order to infer a relationship between the ankyrin repeat proteins, all the ankyrin repeat-containing proteins greater than 120 amino acids in length were aligned and clustered using ClustalW. The amino-acid sequences were too diverse to permit the construction of a reliable phylogenetic tree. But a tree was drawn that clustered similar proteins and allowed for the classification of families of conserved ankyrin repeat domain proteins within the <it>Wolbachia </it>lineage (Figure <figr fid="F3">3</figr>). From this tree, several classes of proteins can be determined that are highly conserved between two or more of these <it>Wolbachia </it>lineages with greater than 95% similarity at the nucleotide level. In addition, ankyrin repeat domain proteins unique to a particular lineage can also be identified. These differences in the complement of ankyrin repeat domain proteins may affect host-endosymbiont interactions.</p>
				<fig id="F3">
					<title>
						<p>Figure 3</p>
					</title>
					<caption>
						<p>Relationship of ankyrin repeat domain proteins between <it>w</it>Mel, <it>w</it>Ana, <it>w</it>Sim and <it>w</it>Moj</p>
					</caption>
					<text>
						<p>Relationship of ankyrin repeat domain proteins between <it>w</it>Mel, <it>w</it>Ana, <it>w</it>Sim and <it>w</it>Moj. All the predicted ankyrin repeat proteins with greater than 120 amino acids were aligned and clustered using ClustalW. Nine predicted ankyrin repeat domain proteins (A-I) were found to be conserved among at least <it>w</it>Mel and one other of these <it>Wolbachia </it>species with nucleotide sequence identity &gt; 95% across the entire length of the gene.</p>
					</text>
					<graphic file="gb-2005-6-3-r23-3"/>
				</fig>
			</sec>
			<sec>
				<st>
					<p>Comparison with other obligate intracellular bacteria</p>
				</st>
				<p>The variability of genome content and synteny identified here with <it>Wolbachia </it>is in contrast to that observed for other obligate intracellular bacteria. Comparative analysis of the Chlamydiaceae shows that the genomes of these organisms are highly conserved in terms of content and gene order, with relatively small differences in the genomes <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>. This is despite the fact that the chlamydial genomes sequenced thus far span four distinct species from various hosts and cause different tissue tropism and disease pathology.</p>
				<p>Similarly, rickettsial genomes have a high degree of synteny and gene conservation with the exception of numerous unique sequences in the genome of <it>Rickettsia conorii </it><abbrgrp><abbr bid="B14">14</abbr></abbrgrp>. Although <it>R. conorii </it>maintains synteny with <it>Rickettsia prowazekii </it>and <it>Rickettsia typhi</it>, it has 560 unique genes relative to the other two. In contrast, the sequencing of <it>R. typhi </it>revealed only 24 novel genes.</p>
				<p><it>Wolbachia </it>genomes seem to have little synteny <abbrgrp><abbr bid="B3">3</abbr></abbrgrp> and large variations in genome size and genome content. This may reflect the levels of intraspecies contact <it>in vivo</it>. <it>Wolbachia </it>are abundant in nature, are able to co-infect arthropods <abbrgrp><abbr bid="B15">15</abbr><abbr bid="B16">16</abbr></abbrgrp>, and are propagated by vertical and horizontal transmission <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Phylogenetic analysis of the WO-B phage shows that under conditions of co-infection, <it>Wolbachia </it>from different supergroups will share the same WO-B phage <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>. These factors may promote genetic exchange between <it>Wolbachia </it>species. In addition, the <it>Wolbachia </it>lifestyle of facilitating its own transmission by host reproductive modification may then promote the successful transmission of genetically diverse strains. Other obligate intracellular bacterial genera may find the series of events involving successful co-infection, exchange of genetic information, and then propagation more challenging and therefore less likely.</p>
			</sec>
			<sec>
				<st>
					<p>Horizontal gene transfer</p>
				</st>
				<p>The presence of endosymbionts within host cells, particularly germline cells, may offer opportunities for HGT, although in general such transfer between prokaryotes and eukaryotes is extremely rare <abbrgrp><abbr bid="B18">18</abbr></abbrgrp>. However, a number of studies have clearly documented cases of transfer of mitochondrial DNA into the nuclear genome <abbrgrp><abbr bid="B19">19</abbr></abbrgrp>, in species as diverse as yeast <abbrgrp><abbr bid="B20">20</abbr></abbrgrp>, <it>Arabidopsis thaliana </it><abbrgrp><abbr bid="B21">21</abbr></abbrgrp> and other plants <abbrgrp><abbr bid="B22">22</abbr></abbrgrp>, and human <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>. The mitochondrial organelle itself is widely believed to derive from an ancestral endosymbiont <abbrgrp><abbr bid="B19">19</abbr><abbr bid="B24">24</abbr></abbrgrp>. Although we do not here provide evidence for HGT from <it>Wolbachia </it>to <it>Drosophila</it>, at least one recent study claims that a <it>Wolbachia </it>endosymbiont has transferred genes to the X chromosome of an insect, the adzuki bean beetle <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. The analysis of the <it>w</it>Mel genome examined this question, but did not find any evidence for HGT into the <it>D. melanogaster </it>host <abbrgrp><abbr bid="B3">3</abbr></abbrgrp>.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st>
			<p>The discovery of these three new genomes demonstrates how powerful the public release of raw sequencing data can be. Although none of these projects had as its goal the sequencing of bacterial endosymbionts, we now have as a result three partial genomes - one nearly complete - of this biologically important species. The differences between these genomes and the completed <it>w</it>Mel strain demonstrate extensive genome rearrangement and divergence among these <it>Wolbachia </it>endosymbionts. And although it is a small sample, when taken together the presence of these three new genomes indicates that <it>Wolbachia </it>endosymbionts appear to be quite common in the <it>Drosophila </it>lineage. Multiple future <it>Drosophila </it>sequencing projects are planned, several of which are already underway, as are projects to sequence other invertebrates, many of which may host <it>Wolbachia </it>or other endosymbionts. Our results suggest that new screening methods, such as those described here, may yield unexpected discoveries from the data in the Trace Archive.</p>
		</sec>
		<sec>
			<st>
				<p>Materials and methods</p>
			</st>
			<p>We downloaded from the Trace Archive at NCBI <abbrgrp><abbr bid="B1">1</abbr></abbrgrp> the following numbers of raw sequences from each Drosophila species: 2,772,509 sequences from <it>D. ananassae</it>; 2,445,065 from <it>D. mojavensis</it>; 2,214,248 from <it>D. simulans</it>; 2,061,010 from <it>D. yakuba</it>; 3,359,782 from <it>D. virilis</it>; 2,590,703 from <it>D. pseudoobscura</it>; and 3,663,352 from <it>D. melanogaster</it>. For each project, we downloaded sequences, quality values, and ancillary data (containing clone-mate information, clone insert lengths, and sometimes trimming parameters), comprising approximately 2-3 gigabytes (GB) of compressed data per genome.</p>
			<p>For each genome, we used the nucmer program from the MUMmer package <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp> to search the complete genome of <it>W. pipientis w</it>Mel against the files containing the sequences. We pulled out any single sequence ('read') with at least one 30-bp exact match to <it>w</it>Mel, and with an extended match that spanned at least 65 bp. We then retrieved the 'clone mates' of each sequence: most of the reads in whole-genome sequencing projects are obtained via a double-ended shotgun method, meaning that both ends of each clone insert are sequenced. The Trace Archive contains a link to the clone mate for each read; we used this information to extract any mates that were not contained in our original screen. For example, the <it>D. ananassae </it>data yielded approximately 5,000 additional reads when we pulled in the mates from the original set.</p>
			<p>We then assembled the <it>Wolbachia </it>reads in two different ways: with the Celera Assembler <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>, treating it as a normal (<it>de novo</it>) whole-genome assembly, and with the AMOS-cmp assembler <abbrgrp><abbr bid="B30">30</abbr></abbrgrp>, which assembles a genome by mapping it onto a reference. For the reference genome we used <it>w</it>Mel. We used Celera Assembler on the relatively well-covered <it>w</it>Ana strain; although we ran it on the <it>w</it>Sim reads as well, the sequence coverage was too light to yield a good assembly. The high degree of sequence identity, at 95-100% across most regions that are shared between strains, allowed for an excellent comparative assembly of the <it>w</it>Sim strain with AMOS-cmp.</p>
			<p>The AMOS-cmp assembly of <it>w</it>Sim contains 388 contigs plus another 241 singleton reads, covering 896,761 bp (see Table <tblr tid="T1">1</tblr>). The largest contig contains 16,701 bp. Note that AMOS-cmp produces contigs but not scaffolds. The contigs can easily be aligned to the reference genome to produce scaffolds, with the caveat that any rearrangements will invalidate such scaffolding information. To avoid such problems, we ordered and oriented the contigs separately with Bambus <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>, a stand-alone genome scaffolding program, using only the clone-mate information from the original shotgun data. Bambus created 84 multi-contig scaffolds that joined together 273 of the 388 contigs, with the largest scaffold containing 50,851 bp and spanning (including estimated gaps) 54,207 bp.</p>
			<p>For <it>w</it>Ana, when we compared the <it>de novo </it>and comparative assemblies, we observed that there were multiple rearrangements in the <it>w</it>Ana genome as compared to <it>w</it>Mel. Our conclusion was that a comparative assembly, which relies on the genome structure of the reference, may be less accurate than a <it>de novo </it>assembly in the presence of extensive rearrangements, so we used the latter for our analysis.</p>
			<p>The <it>w</it>Ana assembly presented special challenges because of what appear to be a large number of rearrangements and polymorphisms within the sequences. The number of <it>Wolbachia </it>reads provided very deep coverage, which in principle should have produced a scaffold that covered nearly the entire genome. However, a large number of clone-mate links were inconsistent with one another, indicating that the reads may have been drawn from a population in which many of the individuals had genome rearrangements with respect to one another. We also found locations spanning hundreds of nucleotides where four or five individual reads had one nucleotide and the same number had a different nucleotide. These polymorphisms made it difficult to create many consistent large scaffolds. We created multiple assemblies in which we removed many of the inconsistent links, and eventually settled on the assembly presented here as the best representative of the genome possible given the diversity in the data. The <it>w</it>Ana assembly has three large scaffolds of 460 kb, 157 kb, and 121 kb respectively, with all remaining scaffolds less than 20 kb in length. We also include a list of all the individual sequences, including those not incorporated into contigs, in our Additional data files.</p>
			<p>To annotate the resulting sets of contigs, we used Glimmer <abbrgrp><abbr bid="B32">32</abbr><abbr bid="B33">33</abbr></abbrgrp> to make initial gene calls and BLAST <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> to search those calls against a comprehensive protein database. Regions with no gene calls were searched as well in all six reading frames using Blastx.</p>
			<p>All the predicted genes in <it>w</it>Ana, <it>w</it>Sim, and <it>w</it>Moj were searched against <it>w</it>Mel using Blastn. The results of these searches were used to determine what genes are absent in the <it>w</it>Ana, <it>w</it>Sim, and <it>w</it>Moj assemblies. DNA sequence matches at 80% identity for 80% length of the smaller of the genes were determined to be conserved and are plotted in Figure <figr fid="F2">2</figr>. Regions A and B in Figure <figr fid="F2">2</figr> were identified in this manner. To identify the unique genes in the <it>w</it>Ana, <it>w</it>Sim, and <it>w</it>Moj assemblies, all predicted proteins were searched against the <it>w</it>Mel proteins using Blastp. Proteins in the new genomes were considered unique (or highly divergent) when the best match in <it>w</it>Mel had an E-value greater than 10<sup>-15</sup>.</p>
			<p>To create the multiple alignments of the 90 sequences that were shared by all four organisms, we searched the 114 sequences in <it>w</it>Moj against the <it>w</it>Mel, <it>w</it>Ana, and <it>w</it>Sim genome assemblies, again using nucmer. We used the output of nucmer to extract from each genome the appropriate matching sequence, and we fed the results to the overlapper (hash-overlap) from the AMOS assembler <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> to generate all pairwise sequence alignments.</p>
			<p>All ankyrin repeat domain proteins identified by automated annotation were compiled and an alignment and tree were constructed using ClustalW <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. The ankyrin repeat domain is a degenerate repeat <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>, so no attempt was made to cluster proteins where the ankyrin repeat motifs were removed.</p>
			<p>The whole-genome shotgun assemblies, with annotation, have been deposited at DDBJ/EMBL/GenBank under the project accession AAGB00000000 (<it>w</it>Ana) and AAGC00000000 (<it>w</it>Sim). The versions described in this paper are the first versions, AAGB01000000 and AAGC01000000. The sequences and annotation for <it>w</it>Moj have consecutive accessions AY897435 through AY897548. The unassembled <it>w</it>Moj reads are also available from the Trace Archive and from the Additional data files for this paper.</p>
		</sec>
		<sec>
			<st>
				<p>Additional data files</p>
			</st>
			<p>The following additional data is available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> contains four tables: the first three list the unique genes in the <it>w</it>Ana, <it>w</it>Sim and <it>w</it>Moj genomes respectively; the fourth lists the Trace Archive identifiers for the 114 reads comprising the <it>w</it>Moj sequences from the <it>D. mojavensis </it>genome project. Additional data file <supplr sid="S2">2</supplr> is a multi-fasta file containing the sequences of the 114 <it>w</it>Moj reads.</p>
			<suppl id="S1">
				<title>
					<p>Additional File 1</p>
				</title>
				<caption>
					<p>Supplementary Tables 1, 2, and 3 listing the unique genes in the <it>w</it>Ana, <it>w</it>Sim and <it>w</it>Moj genomes respectively and Supplementary Table 4 listing the Trace Archive identifiers for the 114 reads comprising the <it>w</it>Moj sequences from the <it>D. mojavensis </it>genome project</p>
				</caption>
				<text>
					<p>Supplementary Tables 1, 2, and 3 listing the unique genes in the <it>w</it>Ana, <it>w</it>Sim and <it>w</it>Moj genomes respectively and Supplementary Table 4 listing the Trace Archive identifiers for the 114 reads comprising the <it>w</it>Moj sequences from the <it>D. mojavensis </it>genome project</p>
				</text>
				<file name="gb-2005-6-3-r23-S1.doc">
					<p>Click here for file</p>
				</file>
			</suppl>
			<suppl id="S2">
				<title>
					<p>Additional File 2</p>
				</title>
				<caption>
					<p>The sequences of the 114 <it>w</it>Moj reads</p>
				</caption>
				<text>
					<p>The sequences of the 114 <it>w</it>Moj reads</p>
				</text>
				<file name="gb-2005-6-3-r23-S2.fast">
					<p>Click here for file</p>
				</file>
			</suppl>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st>
				<p>We thank Hean Koo for help with genome data management, and Herv&#233; Tettelin and Martin Wu for helpful comments on the manuscript. We also thank Agencourt Bioscience, the Washington University Genome Sequencing Center and the NIH for making sequence data publicly available through the NCBI Trace Archive. S.L.S., A.L.D., and M.P. were supported in part by the NIH under grants R01-LM06845 and R01-LM007938 to SLS. J.D.H. was supported by funds from National Science Foundation Frontiers in Integrative Biological Research under grant EF-0328363.</p>
			</sec>
		</ack>
		<refgrp>
			<bibl id="B1">
				<title>
					<p>The NCBI Trace Archive</p>
				</title>
				<url>http://www.ncbi.nih.gov/Traces</url>
			</bibl>
			<bibl id="B2">
				<title>
					<p>Wolbachia infections are distributed throughout insect somatic and germ line tissues.</p>
				</title>
				<aug>
					<au>
						<snm>Dobson</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Bourtzis</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Braig</snm>
						<fnm>HR</fnm>
					</au>
					<au>
						<snm>Jones</snm>
						<fnm>BF</fnm>
					</au>
					<au>
						<snm>Zhou</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Rousset</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>O'Neill</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Insect Biochem Mol Biol</source>
				<pubdate>1999</pubdate>
				<volume>29</volume>
				<fpage>153</fpage>
				<lpage>160</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0965-1748(98)00119-2</pubid>
						<pubid idtype="pmpid" link="fulltext">10196738</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B3">
				<title>
					<p>Phylogenomics of the reproductive parasite <it>Wolbachia pipientis w</it>Mel: a streamlined genome overrun by mobile genetic elements.</p>
				</title>
				<aug>
					<au>
						<snm>Wu</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Sun</snm>
						<fnm>LV</fnm>
					</au>
					<au>
						<snm>Vamathevan</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Riegler</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Deboy</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Brownlie</snm>
						<fnm>JC</fnm>
					</au>
					<au>
						<snm>McGraw</snm>
						<fnm>EA</fnm>
					</au>
					<au>
						<snm>Martin</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Esser</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Ahmadinejad</snm>
						<fnm>N</fnm>
					</au>
					<etal/>
				</aug>
				<source>PLoS Biol</source>
				<pubdate>2004</pubdate>
				<volume>2</volume>
				<fpage>E69</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">368164</pubid>
						<pubid idtype="pmpid" link="fulltext">15024419</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B4">
				<title>
					<p><it>Wolbachia </it>infection frequencies in insects: evidence of a global equilibrium?</p>
				</title>
				<aug>
					<au>
						<snm>Werren</snm>
						<fnm>JH</fnm>
					</au>
					<au>
						<snm>Windsor</snm>
						<fnm>DM</fnm>
					</au>
				</aug>
				<source>Proc R Soc Lond B Biol Sci</source>
				<pubdate>2000</pubdate>
				<volume>267</volume>
				<fpage>1277</fpage>
				<lpage>1285</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1098/rspb.2000.1139</pubid>
						<pubid idtype="pmpid">10972121</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B5">
				<title>
					<p>Long PCR improves Wolbachia DNA amplification: wsp sequences found in 76% of sixty-three arthropod species.</p>
				</title>
				<aug>
					<au>
						<snm>Jeyaprakash</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Hoy</snm>
						<fnm>MA</fnm>
					</au>
				</aug>
				<source>Insect Mol Biol</source>
				<pubdate>2000</pubdate>
				<volume>9</volume>
				<fpage>393</fpage>
				<lpage>405</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2583.2000.00203.x</pubid>
						<pubid idtype="pmpid" link="fulltext">10971717</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B6">
				<title>
					<p><it>Drosophila ananassae </it>and <it>Drosophila mojavensis </it>whole-genome shotgun reads</p>
				</title>
				<aug>
					<au>
						<snm>Smith</snm>
						<fnm>DR</fnm>
					</au>
				</aug>
				<publisher>Beverley, MA: Agencourt Bioscience Corporation</publisher>
				<pubdate>2004</pubdate>
			</bibl>
			<bibl id="B7">
				<title>
					<p><it>Drosophila simulans </it>whole-genome shotgun reads</p>
				</title>
				<aug>
					<au>
						<snm>Wilson</snm>
						<fnm>RK</fnm>
					</au>
				</aug>
				<publisher>St Louis, MO: Washington University Genome Sequencing Center</publisher>
				<pubdate>2004</pubdate>
			</bibl>
			<bibl id="B8">
				<title>
					<p>Expression of cytoplasmic incompatibility in <it>Drosophila simulans </it>and its impact on infection frequencies and distribution of <it>Wolbachia pipientis</it>.</p>
				</title>
				<aug>
					<au>
						<snm>James</snm>
						<fnm>AC</fnm>
					</au>
					<au>
						<snm>Ballard</snm>
						<fnm>JW</fnm>
					</au>
				</aug>
				<source>Evolution Int J Org Evolution</source>
				<pubdate>2000</pubdate>
				<volume>54</volume>
				<fpage>1661</fpage>
				<lpage>1672</lpage>
				<xrefbib>
					<pubid idtype="pmpid">11108593</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B9">
				<title>
					<p><it>Wolbachia </it>infection and cytoplasmic incompatibility in <it>Drosophila </it>species.</p>
				</title>
				<aug>
					<au>
						<snm>Bourtzis</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Nirgianaki</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Markakis</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Savakis</snm>
						<fnm>C</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>1996</pubdate>
				<volume>144</volume>
				<fpage>1063</fpage>
				<lpage>1073</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8913750</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B10">
				<title>
					<p>Wolbachia online resource</p>
				</title>
				<url>http://www.wolbachia.sols.uq.edu.au</url>
			</bibl>
			<bibl id="B11">
				<title>
					<p>Bacteriophage WO and virus-like particles in <it>Wolbachia</it>, an endosymbiont of arthropods.</p>
				</title>
				<aug>
					<au>
						<snm>Masui</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Kuroiwa</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Sasaki</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Inui</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kuroiwa</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Ishikawa</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Biochem Biophys Res Commun</source>
				<pubdate>2001</pubdate>
				<volume>283</volume>
				<fpage>1099</fpage>
				<lpage>1104</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1006/bbrc.2001.4906</pubid>
						<pubid idtype="pmpid" link="fulltext">11355885</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B12">
				<title>
					<p>Bacteriophage flux in endosymbionts (<it>Wolbachia</it>): infection frequency, lateral transfer, and recombination rates.</p>
				</title>
				<aug>
					<au>
						<snm>Bordenstein</snm>
						<fnm>SR</fnm>
					</au>
					<au>
						<snm>Wernegreen</snm>
						<fnm>JJ</fnm>
					</au>
				</aug>
				<source>Mol Biol Evol</source>
				<pubdate>2004</pubdate>
				<volume>21</volume>
				<fpage>1981</fpage>
				<lpage>1991</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1093/molbev/msh211</pubid>
						<pubid idtype="pmpid" link="fulltext">15254259</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B13">
				<title>
					<p>Genome sequence of <it>Chlamydophila caviae </it>(<it>Chlamydia psittaci </it>GPIC): examining the role of niche-specific genes in the evolution of the Chlamydiaceae.</p>
				</title>
				<aug>
					<au>
						<snm>Read</snm>
						<fnm>TD</fnm>
					</au>
					<au>
						<snm>Myers</snm>
						<fnm>GS</fnm>
					</au>
					<au>
						<snm>Brunham</snm>
						<fnm>RC</fnm>
					</au>
					<au>
						<snm>Nelson</snm>
						<fnm>WC</fnm>
					</au>
					<au>
						<snm>Paulsen</snm>
						<fnm>IT</fnm>
					</au>
					<au>
						<snm>Heidelberg</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Holtzapple</snm>
						<fnm>E</fnm>
					</au>
					<au>
						<snm>Khouri</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Federova</snm>
						<fnm>NB</fnm>
					</au>
					<au>
						<snm>Carty</snm>
						<fnm>HA</fnm>
					</au>
					<etal/>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>2134</fpage>
				<lpage>2147</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">153749</pubid>
						<pubid idtype="pmpid" link="fulltext">12682364</pubid>
						<pubid idtype="doi">10.1093/nar/gkg321</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B14">
				<title>
					<p>Complete genome sequence of <it>Rickettsia typhi </it>and comparison with sequences of other rickettsiae.</p>
				</title>
				<aug>
					<au>
						<snm>McLeod</snm>
						<fnm>MP</fnm>
					</au>
					<au>
						<snm>Qin</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Karpathy</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Gioia</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Highlander</snm>
						<fnm>SK</fnm>
					</au>
					<au>
						<snm>Fox</snm>
						<fnm>GE</fnm>
					</au>
					<au>
						<snm>McNeill</snm>
						<fnm>TZ</fnm>
					</au>
					<au>
						<snm>Jiang</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Muzny</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Jacob</snm>
						<fnm>LS</fnm>
					</au>
					<etal/>
				</aug>
				<source>J Bacteriol</source>
				<pubdate>2004</pubdate>
				<volume>186</volume>
				<fpage>5842</fpage>
				<lpage>5855</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">516817</pubid>
						<pubid idtype="pmpid" link="fulltext">15317790</pubid>
						<pubid idtype="doi">10.1128/JB.186.17.5842-5855.2004</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B15">
				<title>
					<p>Single and double infections with <it>Wolbachia </it>in the parasitic wasp <it>Nasonia vitripennis</it>: effects on compatibility.</p>
				</title>
				<aug>
					<au>
						<snm>Perrot-Minnot</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Guo</snm>
						<fnm>LR</fnm>
					</au>
					<au>
						<snm>Werren</snm>
						<fnm>JH</fnm>
					</au>
				</aug>
				<source>Genetics</source>
				<pubdate>1996</pubdate>
				<volume>143</volume>
				<fpage>961</fpage>
				<lpage>972</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">8725242</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B16">
				<title>
					<p><it>Wolbachia </it>segregation rate in <it>Drosophila simulans </it>naturally bi-infected cytoplasmic lineages.</p>
				</title>
				<aug>
					<au>
						<snm>Poinsot</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Montchamp-Moreau</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Mercot</snm>
						<fnm>H</fnm>
					</au>
				</aug>
				<source>Heredity</source>
				<pubdate>2000</pubdate>
				<volume>85</volume>
				<fpage>191</fpage>
				<lpage>198</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1046/j.1365-2540.2000.00736.x</pubid>
						<pubid idtype="pmpid">11012722</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B17">
				<title>
					<p>Horizontal transfer of <it>Wolbachia </it>between phylogenetically distant insect species by a naturally occurring mechanism.</p>
				</title>
				<aug>
					<au>
						<snm>Heath</snm>
						<fnm>BD</fnm>
					</au>
					<au>
						<snm>Butcher</snm>
						<fnm>RD</fnm>
					</au>
					<au>
						<snm>Whitfield</snm>
						<fnm>WG</fnm>
					</au>
					<au>
						<snm>Hubbard</snm>
						<fnm>SF</fnm>
					</au>
				</aug>
				<source>Curr Biol</source>
				<pubdate>1999</pubdate>
				<volume>9</volume>
				<fpage>313</fpage>
				<lpage>316</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0960-9822(99)80139-0</pubid>
						<pubid idtype="pmpid" link="fulltext">10209097</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B18">
				<title>
					<p>Microbial genes in the human genome: lateral transfer or gene loss?</p>
				</title>
				<aug>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Peterson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Eisen</snm>
						<fnm>JA</fnm>
					</au>
				</aug>
				<source>Science</source>
				<pubdate>2001</pubdate>
				<volume>292</volume>
				<fpage>1903</fpage>
				<lpage>1906</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.1061036</pubid>
						<pubid idtype="pmpid" link="fulltext">11358996</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B19">
				<title>
					<p>The origin and early evolution of mitochondria.</p>
				</title>
				<aug>
					<au>
						<snm>Gray</snm>
						<fnm>MW</fnm>
					</au>
					<au>
						<snm>Burger</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Lang</snm>
						<fnm>BF</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2001</pubdate>
				<volume>2</volume>
				<fpage>reviews1018.1</fpage>
				<lpage>1018.5</lpage>
				<note>[EDs: check last page number]</note>
				<xrefbib>
					<pubid idtype="doi">10.1186/gb-2001-2-6-reviews1018</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B20">
				<title>
					<p>The dual origin of the yeast mitochondrial proteome.</p>
				</title>
				<aug>
					<au>
						<snm>Karlberg</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Canback</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Kurland</snm>
						<fnm>CG</fnm>
					</au>
					<au>
						<snm>Andersson</snm>
						<fnm>SG</fnm>
					</au>
				</aug>
				<source>Yeast</source>
				<pubdate>2000</pubdate>
				<volume>17</volume>
				<fpage>170</fpage>
				<lpage>187</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1002/1097-0061(20000930)17:3&lt;170::AID-YEA25&gt;3.0.CO;2-V</pubid>
						<pubid idtype="pmpid" link="fulltext">11025528</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B21">
				<title>
					<p>Genetic definition and sequence analysis of <it>Arabidopsis </it>centromeres.</p>
				</title>
				<aug>
					<au>
						<snm>Copenhaver</snm>
						<fnm>GP</fnm>
					</au>
					<au>
						<snm>Nickel</snm>
						<fnm>K</fnm>
					</au>
					<au>
						<snm>Kuromori</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Benito</snm>
						<fnm>MI</fnm>
					</au>
					<au>
						<snm>Kaul</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Lin</snm>
						<fnm>X</fnm>
					</au>
					<au>
						<snm>Bevan</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Murphy</snm>
						<fnm>G</fnm>
					</au>
					<au>
						<snm>Harris</snm>
						<fnm>B</fnm>
					</au>
					<au>
						<snm>Parnell</snm>
						<fnm>LD</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>1999</pubdate>
				<volume>286</volume>
				<fpage>2468</fpage>
				<lpage>2474</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.286.5449.2468</pubid>
						<pubid idtype="pmpid" link="fulltext">10617454</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B22">
				<title>
					<p>Repeated, recent and diverse transfers of a mitochondrial gene to the nucleus in flowering plants.</p>
				</title>
				<aug>
					<au>
						<snm>Adams</snm>
						<fnm>KL</fnm>
					</au>
					<au>
						<snm>Daley</snm>
						<fnm>DO</fnm>
					</au>
					<au>
						<snm>Qiu</snm>
						<fnm>YL</fnm>
					</au>
					<au>
						<snm>Whelan</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Palmer</snm>
						<fnm>JD</fnm>
					</au>
				</aug>
				<source>Nature</source>
				<pubdate>2000</pubdate>
				<volume>408</volume>
				<fpage>354</fpage>
				<lpage>357</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1038/35042567</pubid>
						<pubid idtype="pmpid" link="fulltext">11099041</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B23">
				<title>
					<p>Continued colonization of the human genome by mitochondrial DNA.</p>
				</title>
				<aug>
					<au>
						<snm>Ricchetti</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Tekaia</snm>
						<fnm>F</fnm>
					</au>
					<au>
						<snm>Dujon</snm>
						<fnm>B</fnm>
					</au>
				</aug>
				<source>PLoS Biol</source>
				<pubdate>2004</pubdate>
				<volume>2</volume>
				<fpage>E273</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">515365</pubid>
						<pubid idtype="pmpid" link="fulltext">15361937</pubid>
						<pubid idtype="doi">10.1371/journal.pbio.0020273</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B24">
				<title>
					<p>Gene transfer from organelles to the nucleus: how much, what happens, and why?</p>
				</title>
				<aug>
					<au>
						<snm>Martin</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Herrmann</snm>
						<fnm>RG</fnm>
					</au>
				</aug>
				<source>Plant Physiol</source>
				<pubdate>1998</pubdate>
				<volume>118</volume>
				<fpage>9</fpage>
				<lpage>17</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1104/pp.118.1.9</pubid>
						<pubid idtype="pmpid" link="fulltext">9733521</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B25">
				<title>
					<p>Genome fragment of <it>Wolbachia </it>endosymbiont transferred to X chromosome of host insect.</p>
				</title>
				<aug>
					<au>
						<snm>Kondo</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Nikoh</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Ijichi</snm>
						<fnm>N</fnm>
					</au>
					<au>
						<snm>Shimada</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Fukatsu</snm>
						<fnm>T</fnm>
					</au>
				</aug>
				<source>Proc Natl Acad Sci USA</source>
				<pubdate>2002</pubdate>
				<volume>99</volume>
				<fpage>14280</fpage>
				<lpage>14285</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">137875</pubid>
						<pubid idtype="pmpid" link="fulltext">12386340</pubid>
						<pubid idtype="doi">10.1073/pnas.222228199</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B26">
				<title>
					<p>Alignment of whole genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Kasif</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Fleischmann</snm>
						<fnm>RD</fnm>
					</au>
					<au>
						<snm>Peterson</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>2369</fpage>
				<lpage>2376</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148804</pubid>
						<pubid idtype="pmpid" link="fulltext">10325427</pubid>
						<pubid idtype="doi">10.1093/nar/27.11.2369</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B27">
				<title>
					<p>Fast algorithms for large-scale genome alignment and comparison.</p>
				</title>
				<aug>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Phillippy</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Carlton</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2002</pubdate>
				<volume>30</volume>
				<fpage>2478</fpage>
				<lpage>2483</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">117189</pubid>
						<pubid idtype="pmpid" link="fulltext">12034836</pubid>
						<pubid idtype="doi">10.1093/nar/30.11.2478</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B28">
				<title>
					<p>Versatile and open software for comparing large genomes.</p>
				</title>
				<aug>
					<au>
						<snm>Kurtz</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>Phillippy</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Smoot</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Shumway</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Antonescu</snm>
						<fnm>C</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Genome Biol</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>R12</fpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">395750</pubid>
						<pubid idtype="pmpid" link="fulltext">14759262</pubid>
						<pubid idtype="doi">10.1186/gb-2004-5-2-r12</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B29">
				<title>
					<p>A whole-genome assembly of <it>Drosophila</it>.</p>
				</title>
				<aug>
					<au>
						<snm>Myers</snm>
						<fnm>EW</fnm>
					</au>
					<au>
						<snm>Sutton</snm>
						<fnm>GG</fnm>
					</au>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Dew</snm>
						<fnm>IM</fnm>
					</au>
					<au>
						<snm>Fasulo</snm>
						<fnm>DP</fnm>
					</au>
					<au>
						<snm>Flanigan</snm>
						<fnm>MJ</fnm>
					</au>
					<au>
						<snm>Kravitz</snm>
						<fnm>SA</fnm>
					</au>
					<au>
						<snm>Mobarry</snm>
						<fnm>CM</fnm>
					</au>
					<au>
						<snm>Reinert</snm>
						<fnm>KH</fnm>
					</au>
					<au>
						<snm>Remington</snm>
						<fnm>KA</fnm>
					</au>
					<etal/>
				</aug>
				<source>Science</source>
				<pubdate>2000</pubdate>
				<volume>287</volume>
				<fpage>2196</fpage>
				<lpage>2204</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1126/science.287.5461.2196</pubid>
						<pubid idtype="pmpid" link="fulltext">10731133</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B30">
				<title>
					<p>Comparative genome assembly.</p>
				</title>
				<aug>
					<au>
						<snm>Pop</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Phillippy</snm>
						<fnm>A</fnm>
					</au>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Brief Bioinform</source>
				<pubdate>2004</pubdate>
				<volume>5</volume>
				<fpage>237</fpage>
				<lpage>248</lpage>
				<xrefbib>
					<pubid idtype="pmpid" link="fulltext">15383210</pubid>
				</xrefbib>
			</bibl>
			<bibl id="B31">
				<title>
					<p>Hierarchical scaffolding with Bambus.</p>
				</title>
				<aug>
					<au>
						<snm>Pop</snm>
						<fnm>M</fnm>
					</au>
					<au>
						<snm>Kosack</snm>
						<fnm>DS</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Genome Res</source>
				<pubdate>2004</pubdate>
				<volume>14</volume>
				<fpage>149</fpage>
				<lpage>159</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">314292</pubid>
						<pubid idtype="pmpid" link="fulltext">14707177</pubid>
						<pubid idtype="doi">10.1101/gr.1536204</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B32">
				<title>
					<p>Microbial gene identification using interpolated Markov models.</p>
				</title>
				<aug>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Kasif</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1998</pubdate>
				<volume>26</volume>
				<fpage>544</fpage>
				<lpage>548</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">147303</pubid>
						<pubid idtype="pmpid" link="fulltext">9421513</pubid>
						<pubid idtype="doi">10.1093/nar/26.2.544</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B33">
				<title>
					<p>Improved microbial gene identification with GLIMMER.</p>
				</title>
				<aug>
					<au>
						<snm>Delcher</snm>
						<fnm>AL</fnm>
					</au>
					<au>
						<snm>Harmon</snm>
						<fnm>D</fnm>
					</au>
					<au>
						<snm>Kasif</snm>
						<fnm>S</fnm>
					</au>
					<au>
						<snm>White</snm>
						<fnm>O</fnm>
					</au>
					<au>
						<snm>Salzberg</snm>
						<fnm>SL</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1999</pubdate>
				<volume>27</volume>
				<fpage>4636</fpage>
				<lpage>4641</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">148753</pubid>
						<pubid idtype="pmpid" link="fulltext">10556321</pubid>
						<pubid idtype="doi">10.1093/nar/27.23.4636</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B34">
				<title>
					<p>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.</p>
				</title>
				<aug>
					<au>
						<snm>Altschul</snm>
						<fnm>SF</fnm>
					</au>
					<au>
						<snm>Madden</snm>
						<fnm>TL</fnm>
					</au>
					<au>
						<snm>Schaffer</snm>
						<fnm>AA</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>J</fnm>
					</au>
					<au>
						<snm>Zhang</snm>
						<fnm>Z</fnm>
					</au>
					<au>
						<snm>Miller</snm>
						<fnm>W</fnm>
					</au>
					<au>
						<snm>Lipman</snm>
						<fnm>DJ</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>1997</pubdate>
				<volume>25</volume>
				<fpage>3389</fpage>
				<lpage>3402</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">146917</pubid>
						<pubid idtype="pmpid" link="fulltext">9254694</pubid>
						<pubid idtype="doi">10.1093/nar/25.17.3389</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B35">
				<title>
					<p>Multiple sequence alignment with the Clustal series of programs.</p>
				</title>
				<aug>
					<au>
						<snm>Chenna</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Sugawara</snm>
						<fnm>H</fnm>
					</au>
					<au>
						<snm>Koike</snm>
						<fnm>T</fnm>
					</au>
					<au>
						<snm>Lopez</snm>
						<fnm>R</fnm>
					</au>
					<au>
						<snm>Gibson</snm>
						<fnm>TJ</fnm>
					</au>
					<au>
						<snm>Higgins</snm>
						<fnm>DG</fnm>
					</au>
					<au>
						<snm>Thompson</snm>
						<fnm>JD</fnm>
					</au>
				</aug>
				<source>Nucleic Acids Res</source>
				<pubdate>2003</pubdate>
				<volume>31</volume>
				<fpage>3497</fpage>
				<lpage>3500</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="pmcid">168907</pubid>
						<pubid idtype="pmpid" link="fulltext">12824352</pubid>
						<pubid idtype="doi">10.1093/nar/gkg500</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
			<bibl id="B36">
				<title>
					<p>The folding and design of repeat proteins: reaching a consensus.</p>
				</title>
				<aug>
					<au>
						<snm>Main</snm>
						<fnm>ER</fnm>
					</au>
					<au>
						<snm>Jackson</snm>
						<fnm>SE</fnm>
					</au>
					<au>
						<snm>Regan</snm>
						<fnm>L</fnm>
					</au>
				</aug>
				<source>Curr Opin Struct Biol</source>
				<pubdate>2003</pubdate>
				<volume>13</volume>
				<fpage>482</fpage>
				<lpage>489</lpage>
				<xrefbib>
					<pubidlist>
						<pubid idtype="doi">10.1016/S0959-440X(03)00105-2</pubid>
						<pubid idtype="pmpid" link="fulltext">12948778</pubid>
					</pubidlist>
				</xrefbib>
			</bibl>
		</refgrp>
	</bm>
</art>

