Preserving nearest neighbor consistency in cluster analysis

Thumbnail Image
Date
2009-01-01
Authors
Lee, Jong-seok
Major Professor
Advisor
Sigurdur Olafsson
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Industrial and Manufacturing Systems Engineering
The Department of Industrial and Manufacturing Systems Engineering teaches the design, analysis, and improvement of the systems and processes in manufacturing, consulting, and service industries by application of the principles of engineering. The Department of General Engineering was formed in 1929. In 1956 its name changed to Department of Industrial Engineering. In 1989 its name changed to the Department of Industrial and Manufacturing Systems Engineering.
Journal Issue
Is Version Of
Versions
Series
Department
Industrial and Manufacturing Systems Engineering
Abstract

The two main streams in finding cluster structure from data could be to identify the number of natural clusters and, of course, to group the objects in a reasonable way. In order to achieve good results for these two, measuring goodness of clustering is required prior to beginning any related studies because it helps to establish a definition of cluster that could be ambiguous by individuals having different opinions on it. In this research we are concerned about the compactness and the connectivity of cluster as our goodness measurements. The former has been regarded as one of the most important properties that should be accomplished in a clustering task, whereas the latter that we think as a significant factor has received less attention. Since we believe that both are individually important, we employ them for better estimating the number of clusters and clustering objects. A new estimating method produces a set of promising estimates by measuring compactness and connectivity from clustered datasets which look similar to the original data but have an amount of perturbation, and then determines a single optimal number by majority voting scheme. The connectivity measure newly introduced in our research is also used as an objective to be achieved in clustering objects. We propose a new clustering algorithm, named as CNCLUST that works in a way to optimize the quantity of connectivity. The proposed clustering algorithm is a greedy heuristic that looks like a single linkage method, but it is distinguishable by the fact that it first considers local compactness of objects and later incorporates it into global connectivity. We conducted numerical experiments in order to evaluate the performances of the proposed methods based on simulated datasets and a real data. The results seem optimistic.

Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright
Thu Jan 01 00:00:00 UTC 2009