Campus Units
Statistics
Document Type
Article
Publication Version
Submitted Manuscript
Publication Date
6-2009
Journal or Book Title
Biometrics
Volume
65
Issue
2
First Page
341
Last Page
352
DOI
10.1111/j.1541-0420.2008.01064.x
Abstract
A new methodology is proposed for clustering datasets in the presence of scattered observations. Scattered observations are defined as unlike any other, so traditional approaches that force them into groups can lead to erroneous conclusions. Our suggested approach is a scheme which, under assumption of homogeneous spherical clusters, iteratively builds cores around their centers and groups points within each core while identifying points outside as scatter. In the absence of scatter, the algorithm reduces to k-means. We also provide methodology to initialize the algorithm and to estimate the number of clusters in the dataset. Results in experimental situations show excellent performance, especially when clusters are elliptically symmetric. The methodology is applied to the analysis of the United States Environmental Protection Agency’s Toxic Release Inventory reports on industrial releases of mercury for the year 2000.
Copyright Owner
The International Biometric Society
Copyright Date
2008
Language
en
File Format
application/pdf
Recommended Citation
Maitra, Ranjan and Ramler, Ivan Peter, "Clustering in the Presence of Scatter" (2009). Statistics Publications. 83.
https://lib.dr.iastate.edu/stat_las_pubs/83
Comments
This is the peer reviewed version of the following article: Maitra,R., and Ramler, I. Clustering in the presence of scatter. Biometrics, 65: 341-352, which has been published in final form at http://dx.doi.org/10.1111/j.1541-0420.2008.01064.x. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for self-archiving.