Promoter-based model of methylation-expression relationships at regulatory sites. (A) A typical gene-CpG sample pair, conveying the methylation of a promoter-based CpGsite in relation to the expression levels of its linked gene across cell types.(B) A machine-learning algorithm (SVM-MAP) was trained to distinguish true gene-CpG pairs out of 50-fold excess of false (randomized) pairs. Through rounds of training and test sets, the algorithm optimized parameters of linear (Pearson coefficient) and monotonic (Spearman) correlations to provide the best discrimination between true and false pairs, producing a general model for methylation-transcription relationships in gene promoters. Based on fitting with the learned model,a score was assigned to each gene-CpG pair. (C) Rates of successful gene-promoter pairing as a function of thresholds on the scores (null expectation = 50%). At score ≥0.85, the model successfully paired 87.2% ofthe genes to their actual promoters (dashed lines).
Aran et al. Genome Biology 2013 14:R21 doi:10.1186/gb-2013-14-3-r21