Open Access Open Badges Research

Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure

Dale Muzzey1, Katja Schwartz2, Jonathan S Weissman1* and Gavin Sherlock2*

Author Affiliations

1 Department of Cellular and Molecular Pharmacology, California Institute for Quantitative Biomedical Research, and Howard Hughes Medical Institute, University of California, San Francisco, San Francisco, CA 94158, USA

2 Department of Genetics, Stanford University, Stanford, CA 94305, USA

For all author emails, please log on.

Genome Biology 2013, 14:R97  doi:10.1186/gb-2013-14-9-r97

Published: 11 September 2013



Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved.


We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans.


The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution.

Haplotype; Phasing; Indel; Microsatellite; Homopolymer; Repeat