Preserving nearest neighbor consistency in cluster analysis

Lee, Jong-seok

Preserving nearest neighbor consistency in cluster analysis

File

Lee_iastate_0097E_10577.pdf (9 MB)

Date

2009-01-01

Authors

Lee, Jong-seok

Advisor

Sigurdur Olafsson

Altmetrics

Organizational Units

Organizational Unit

Industrial and Manufacturing Systems Engineering

The Department of Industrial and Manufacturing Systems Engineering teaches the design, analysis, and improvement of the systems and processes in manufacturing, consulting, and service industries by application of the principles of engineering. The Department of General Engineering was formed in 1929. In 1956 its name changed to Department of Industrial Engineering. In 1989 its name changed to the Department of Industrial and Manufacturing Systems Engineering.

Department

Industrial and Manufacturing Systems Engineering

Abstract

The two main streams in finding cluster structure from data could be to identify the number of natural clusters and, of course, to group the objects in a reasonable way. In order to achieve good results for these two, measuring goodness of clustering is required prior to beginning any related studies because it helps to establish a definition of cluster that could be ambiguous by individuals having different opinions on it. In this research we are concerned about the compactness and the connectivity of cluster as our goodness measurements. The former has been regarded as one of the most important properties that should be accomplished in a clustering task, whereas the latter that we think as a significant factor has received less attention. Since we believe that both are individually important, we employ them for better estimating the number of clusters and clustering objects. A new estimating method produces a set of promising estimates by measuring compactness and connectivity from clustered datasets which look similar to the original data but have an amount of perturbation, and then determines a single optimal number by majority voting scheme. The connectivity measure newly introduced in our research is also used as an objective to be achieved in clustering objects. We propose a new clustering algorithm, named as CNCLUST that works in a way to optimize the quantity of connectivity. The proposed clustering algorithm is a greedy heuristic that looks like a single linkage method, but it is distinguishable by the fact that it first considers local compactness of objects and later incorporates it into global connectivity. We conducted numerical experiments in order to evaluate the performances of the proposed methods based on simulated datasets and a real data. The results seem optimistic.

Copyright

Thu Jan 01 00:00:00 UTC 2009

Collections

Theses and Dissertations

Full item page