PCA was used to visually assess the major sources of variation in the expression data. For each of the four panels, each data point represents a sample; there are 25 samples total. (a) PCA applied to chromosome 21 genes. The x-axis represents the first PC (accounting for 41% of the variance) and the y-axis represents the second PC (accounting for 21.2%). The graph is based on expression values for all 253 probe sets assigned to chromosome 21. This showed that the largest source of variability was due to tissue/cell type, accounting for 62.2% of the variance in the data. (b) PCA applied to chromosome 21 genes. The x-axis corresponds to the third PC, and the y-axis corresponds to the second PC. The third PC showed a separation of trisomic from euploid samples based on gene expression, accounting for 17.2% of the variance in the data. (c) PCA applied to non-chromosome 21 genes. The first two PCs (x- and y-axis) using expression values for genes assigned to all other chromosomes also showed that the largest source of variance was due to tissue (77.4% of total variance). These observations are similar to the results in panel a. (d) PCA applied to non-chromosome 21 genes. The x- and y-axis correspond to the third and second PCs, respectively. In contrast to the results of panel b, the third PC failed to show separation of trisomic from euploid samples (6.9% of total variance). The ellipsoids represent three standard deviations beyond the centroid of each tissue group. Data points correspond to samples (red, Down syndrome; blue, euploid) within a group (cerebrum, diamond symbols on data points, and green ellipsoid; cerebellum, square symbols on data points and blue ellipsoid; astrocyte, triangle symbols on data points and red ellipsoid; heart, hexagon symbols on data points and orange ellipsoid).
Mao et al. Genome Biology 2005 6:R107 doi:10.1186/gb-2005-6-13-r107