A genome-wide view of mutation rate co-variation using multivariate analyses
- Equal contributors
1 Center for Medical Genomics, Penn State University, University Park, PA 16802, USA
2 Integrative Biosciences Program, Penn State University, University Park, PA 16802, USA
3 Department of Statistics, Penn State University, 505A Wartik Laboratory, University Park, PA 16802, USA
4 Department of Biology, Penn State University, 305 Wartik Laboratory, University Park, PA 16802, USA
Genome Biology 2011, 12:R27 doi:10.1186/gb-2011-12-3-r27Published: 22 March 2011
While the abundance of available sequenced genomes has led to many studies of regional heterogeneity in mutation rates, the co-variation among rates of different mutation types remains largely unexplored, hindering a deeper understanding of mutagenesis and genome dynamics. Here, utilizing primate and rodent genomic alignments, we apply two multivariate analysis techniques (principal components and canonical correlations) to investigate the structure of rate co-variation for four mutation types and simultaneously explore the associations with multiple genomic features at different genomic scales and phylogenetic distances.
We observe a consistent, largely linear co-variation among rates of nucleotide substitutions, small insertions and small deletions, with some non-linear associations detected among these rates on chromosome X and near autosomal telomeres. This co-variation appears to be shaped by a common set of genomic features, some previously investigated and some novel to this study (nuclear lamina binding sites, methylated non-CpG sites and nucleosome-free regions). Strong non-linear relationships are also detected among genomic features near the centromeres of large chromosomes. Microsatellite mutability co-varies with other mutation rates at finer scales, but not at 1 Mb, and shows varying degrees of association with genomic features at different scales.
Our results allow us to speculate about the role of different molecular mechanisms, such as replication, recombination, repair and local chromatin environment, in mutagenesis. The software tools developed for our analyses are available through Galaxy, an open-source genomics portal, to facilitate the use of multivariate techniques in future large-scale genomics studies.