Assembly of the Candida albicans genome into sixteen supercontigs aligned on the eight chromosomes
- Equal contributors
1 Biotechnology Research Institute, National Research Council of Canada, Montreal, Quebec, H4P 2R2, Canada
2 University of Minnesota, Minneapolis, MN, 55455, USA
3 Broad Institute of MIT and Harvard, Cambridge, MA, USA
4 Wellcome Trust Sanger Institute, Hinxton, CB10 1SA, UK
5 Paseo Grande, Moraga, CA 94556, USA
6 Research Center for Pathogenic Fungi and Microbial Toxicoses, Chiba University, Chiba, 260-8673, Japan
Genome Biology 2007, 8:R52 doi:10.1186/gb-2007-8-4-r52Published: 9 April 2007
The 10.9× genomic sequence of Candida albicans, the most important human fungal pathogen, was published in 2004. Assembly 19 consisted of 412 supercontigs, of which 266 were a haploid set, since this fungus is diploid and contains an extensive degree of heterozygosity but lacks a complete sexual cycle. However, sequences of specific chromosomes were not determined.
Supercontigs from Assembly 19 (183, representing 98.4% of the sequence) were assigned to individual chromosomes purified by pulse-field gel electrophoresis and hybridized to DNA microarrays. Nine Assembly 19 supercontigs were found to contain markers from two different chromosomes. Assembly 21 contains the sequence of each of the eight chromosomes and was determined using a synteny analysis with preliminary versions of the Candida dubliniensis genome assembly, bioinformatics, a sequence tagged site (STS) map of overlapping fosmid clones, and an optical map. The orientation and order of the contigs on each chromosome, repeat regions too large to be covered by a sequence run, such as the ribosomal DNA cluster and the major repeat sequence, and telomere placement were determined using the STS map. Sequence gaps were closed by PCR and sequencing of the products. The overall assembly was compared to an optical map; this identified some misassembled contigs and gave a size estimate for each chromosome.
Assembly 21 reveals an ancient chromosome fusion, a number of small internal duplications followed by inversions, and a subtelomeric arrangement, including a new gene family, the TLO genes. Correlations of position with relatedness of gene families imply a novel method of dispersion. The sequence of the individual chromosomes of C. albicans raises interesting biological questions about gene family creation and dispersion, subtelomere organization, and chromosome evolution.