Transcripts in a GeneChip type microarray is represented by multiple independent short oligonucleotide probes. One widely used approach is to compute a model based unified expression index for the transcript which is subsequently used for comparative data analysis. Alternative approach is to analyze the data at the probe-level. A good understanding of the effect of the number of probe-pairs included at different statistical threshold used for selection should aid optimal selection of differentials. A test dataset with known differentials was used to study this property in comparisons involving two datasets.
A response surface was plotted by formulating an equation that captures the effect varying threshold of probe-pairs and t-statistic on true positives and false positives identified. The resulting response surface indicate that a wide range of probe-pair and t-statistic combinations yield comparative results. The toplology of the surface was used to define one form of additive cost-based approach - involving t and number of probe-pairs used - to determine the optimum threshold to achieve a good balance of true positives and false positives when comparing two datasets at the probe-level. In addition a data scaling approach was used to study the impact of a selected threshold on the number of false negatives of differing magnitude of differentials in a given dataset.
The results indicate that this response surface assisted approach (termed ResurfP) would be effective in determining optimal data-specific threshold for number of probe-pairs used and of the t-statistic when analyzing differentials between two datasets using probe-level data.