A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere

Maitra, Ranjan; Ramler, Ivan

A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere

File

2010_MaitraR_kMeanDirections.pdf (724.94 KB)

Date

2010-01-01

Authors

Maitra, Ranjan

Ramler, Ivan

Organizational Units

Organizational Unit

Statistics

As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.

Department

Statistics

Abstract

A k-means-type algorithm is proposed for efficiently clustering data constrained to lie on the surface of a p-dimensional unit sphere, or data that are mean-zero-unit-variance standardized observations such as those that occur when using Euclidean distance to cluster time series gene expression data using a correlation metric. We also provide methodology to initialize the algorithm and to estimate the number of clusters in the dataset. Results from a detailed series of experiments show excellent performance, even with very large datasets. The methodology is applied to the analysis of the mitotic cell division cycle of budding yeast dataset of Cho et al. [Molecular Cell (1998), 2, 65–73]. The entire dataset has not been analyzed previously, so our analysis provides an understanding for the complete set of genes acting in concert and differentially. We also use our methodology on the submitted abstracts of oral presentations made at the 2008 Joint Statistical Meetings (JSM) to identify similar topics. Our identified groups are both interpretable and distinct and the methodology provides a possible automated tool for efficient parallel scheduling of presentations at professional meetings.

Comments

This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Computational and Graphicla Statistics in 2010, available online: http://www.tandf.com/10.1198/jcgs.2009.08155.

Copyright

Fri Jan 01 00:00:00 UTC 2010

Collections

Publications

Full item page