Codon sites under positive selection are over-represented in gene regions encoding intrinsically disordered regions of proteins. (a) The ratio of positive to negative sites is higher in IDRs than in regions of regular protein structure. The ratio of positive to negative sites is shown for protein regions predicted to have α-helical (α), β-sheet (β) or intrinsically disordered (IDR) protein conformation. The P-value shows the significance of the difference between the ratio associated with IDRs in relation to regions of regular structure (a χ2 test was used to test the null hypothesis that there is no difference between the ratios associated with different protein conformation classes). (b) The proportion of codons under selection is enhanced in IDRs for positively selected sites but not negatively selected sites. Annotations are as for (a). Differences between the frequencies of negative sites in regions of different protein conformation were not significant. (c) The ratio of positive to negative sites is higher in long IDRs than in structured protein domains. The ratio of positive to negative sites is shown for protein regions within known protein domains (PDB dom) or predicted intrinsically disordered protein regions of at least 30 residues in length (IDR ≥30). The frequency of positively selected codons in IDR ≥30 and PDB dom is 0.0055 and 0.0011, respectively, while the equivalent frequencies for negatively selected codons are 0.0728 and 0.0750, respectively. (d) Codons under positive selection are significantly more frequent in IDRs than expected in relation to an empirically generated random distribution of selected sites. The panels show empirical frequency distributions (histograms) predicted for a random distribution of positively and negatively selected sites within protein regions with intrinsically disordered structure (IDR), β-sheet and α-helix conformation, generated by 10,000 randomization trials. The median of each distribution is shown associated with upward-pointing arrowheads and the observed number of selected sites together with downward-pointing arrowheads. The ratio of the observed number of sites in relation to the median of the random distribution is shown in the upper right corner of each panel. The ratio is significantly different from unity in all cases (P ≤ 10-3) except for negative sites in α-helical regions.
Nilsson et al. Genome Biology 2011 12:R65 doi:10.1186/gb-2011-12-7-r65