Campus Units

Statistics

Document Type

Article

Publication Version

Submitted Manuscript

Publication Date

1-2009

Journal or Book Title

IEEE/ACM Transactions on Computational Biology and Bioinformatics

Volume

6

Issue

1

First Page

144

Last Page

157

DOI

10.1109/TCBB.2007.70244

Abstract

Clustering data sets is a challenging problem needed in a wide array of applications. Partition-optimization approaches, such as k-means or expectation-maximization (EM) algorithms, are suboptimal and find solutions in the vicinity of their initialization. This paper proposes a staged approach to specifying initial values by finding a large number of local modes and then obtaining representatives from the most separated ones. Results on test experiments are excellent. We also provide a detailed comparative assessment of the suggested algorithm with many commonly used initialization approaches in the literature. Finally, the methodology is applied to two data sets on diurnal microarray gene expressions and industrial releases of mercury.

Comments

This is a manuscript of an article from IEEE/ACM Transactions on Computational Biology and Bioinformatics 6 (2009): 144, doi: 10.1109/TCBB.2007.70244. Posted with permission. Copyright 2009 IEEE.

Copyright Owner

IEEE

Language

en

File Format

application/pdf

Published Version

Share

COinS