Table 2 

Top features in CoreBoost 

Classifier type 
Features 



CpG 
P versus U 
Loglikelihood ratios from third order Markov chain, loglikelihood ratios from TSS weight matrix 
GCbox score, weighted score of transcription factor NFY, weighted energy score at position +1 

Weighted score of transcription factor YY1, TATA score, weighted score of transcription factor ELK1 

MTE score, weighted score of transcription factor CREB 

P versus D 
Loglikelihood ratios from third order Markov chain, GCbox score 

Weighted score of transcription factor NFY 

Loglikelihood ratios from TSS weight matrix 

Difference between the energy score around positions 25 and +1 and the average from surroundings 

Loglikelihood ratios from transcription factor ELK1, frequency of G+C 

Loglikelihood ratios from transcription factor YY1, TATA score, frequency of G 

NonCpG 
P versus U 
Correlation between vector of energy scores and empirical average energy profile 
Loglikelihood ratios from third order Markov chain, TATA score 

Difference between the energy score around positions 25 and +1 and the average from surroundings 

Weighted energy at position +1 

Proportion of Inr and GCbox pair within 10 bp of observed distance, Inr score. 

P versus D 
Correlation between vector of energy scores and empirical average energy profile, TATA score 

Loglikelihood ratios from third order Markov chain 

Weighted energy at position +1 

Correlation between vector of flexibility scores and empirical average flexibility profile, Inr score 

Difference between the flexibility score around position +1 and the average from surroundings, GCbox score 



bp, base pairs; D, immediate downstream sequence; P, promoter; TSS, transcription start site; U, immediate upstream sequence. 

Zhao et al. Genome Biology 2007 8:R17 doi:10.1186/gb200782r17 