Verification of protein-protein interaction predictions relative to reliable interactions. Protein pairs in the hidden set in a ten-fold cross validation are ranked based on their predicted interaction probabilities (green, red, and black curves for Prosite, Pfam, and naïve Bayes, respectively). Each point corresponds to a different threshold, giving rise to a different number of predicted interactions. The value on the X-axis is the number of pairs not in the reliable interactions but predicted to interact. The value on the Y-axis is the number of reliable interactions that are predicted to interact. The blue and mustard curves (as relevant) are for pairs ranked by Gavin et al.'s and Krogan et al.'s scores, respectively. (a) Predictions for all protein pairs in our data set. As we can see, InSite with Pfam is better than InSite with Prosite, which is in turn better than the naïve Bayes model. All those three models integrate multiple data sets and thus have higher coverage than other methods using a single assay alone. The cross and circle are the accuracies for interacting pairs based on Ito et al.'s and Uetz et al.'s yeast two-hybrid assays, respectively. (b) Predictions only for pairs in Gavin et al.'s assay, providing a direct comparison of our predicted probability with Gavin et al.'s confidence score on the same set of protein pairs. (c) Predictions only for pairs in Krogan et al.'s assay, providing a direct comparison of our predicted probability with Krogan et al.'s confidence score on the same set of protein pairs.
Wang et al. Genome Biology 2007 8:R192 doi:10.1186/gb-2007-8-9-r192