Open Access Highly Accessed Open Badges Method

Clustering analysis of SAGE data using a Poisson approach

Li Cai1, Haiyan Huang25, Seth Blackshaw36, Jun S Liu4, Connie Cepko3 and Wing H Wong24*

Author Affiliations

1 Department of Research Computing, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115, USA

2 Department of Biostatistics, Harvard School of Public Health, 66 Huntington Avenue, Boston, MA 02115, USA

3 Department of Genetics, Harvard Medical School, 77 Avenue Louis Pasteur, Boston, MA 02115, USA

4 Department of Statistics, Harvard University, Science Center, 1 Oxford Street, Cambridge, MA 02138, USA

5 Current address: Department of Statistics, University of California, Berkeley, 367 Evans Hall, Berkeley, CA 94720, USA

6 Current address: Department of Neuroscience, Johns Hopkins University School of Medicine, 773 N Broadway Ave, Baltimore, MD 21287, USA

For all author emails, please log on.

Genome Biology 2004, 5:R51  doi:10.1186/gb-2004-5-7-r51

Published: 29 June 2004


Serial analysis of gene expression (SAGE) data have been poorly exploited by clustering analysis owing to the lack of appropriate statistical methods that consider their specific properties. We modeled SAGE data by Poisson statistics and developed two Poisson-based distances. Their application to simulated and experimental mouse retina data show that the Poisson-based distances are more appropriate and reliable for analyzing SAGE data compared to other commonly used distances or similarity measures such as Pearson correlation or Euclidean distance.