Open Access Open Badges Research

Codon usage patterns in Nematoda: analysis based on over 25 million codons in thirty-two species

Makedonka Mitreva1*, Michael C Wendl1, John Martin1, Todd Wylie1, Yong Yin1, Allan Larson2, John Parkinson3, Robert H Waterston4 and James P McCarter15

Author Affiliations

1 Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA

2 Department of Biology, Washington University, St. Louis, Missouri 63130, USA

3 Hospital for Sick Children, Toronto, and Departments of Biochemistry/Medical Genetics and Microbiology, University of Toronto, M5G 1X8, Canada

4 Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA

5 Divergence Inc., St Louis, Missouri 63141, USA

For all author emails, please log on.

Genome Biology 2006, 7:R75  doi:10.1186/gb-2006-7-8-r75

Published: 14 August 2006



Codon usage has direct utility in molecular characterization of species and is also a marker for molecular evolution. To understand codon usage within the diverse phylum Nematoda, we analyzed a total of 265,494 expressed sequence tags (ESTs) from 30 nematode species. The full genomes of Caenorhabditis elegans and C. briggsae were also examined. A total of 25,871,325 codons were analyzed and a comprehensive codon usage table for all species was generated. This is the first codon usage table available for 24 of these organisms.


Codon usage similarity in Nematoda usually persists over the breadth of a genus but then rapidly diminishes even within each clade. Globodera, Meloidogyne, Pristionchus, and Strongyloides have the most highly derived patterns of codon usage. The major factor affecting differences in codon usage between species is the coding sequence GC content, which varies in nematodes from 32% to 51%. Coding GC content (measured as GC3) also explains much of the observed variation in the effective number of codons (R = 0.70), which is a measure of codon bias, and it even accounts for differences in amino acid frequency. Codon usage is also affected by neighboring nucleotides (N1 context). Coding GC content correlates strongly with estimated noncoding genomic GC content (R = 0.92). On examining abundant clusters in five species, candidate optimal codons were identified that may be preferred in highly expressed transcripts.


Evolutionary models indicate that total genomic GC content, probably the product of directional mutation pressure, drives codon usage rather than the converse, a conclusion that is supported by examination of nematode genomes.