Degree Type

Dissertation

Date of Award

2011

Degree Name

Doctor of Philosophy

Department

Statistics

First Advisor

Arka P. Ghosh

Second Advisor

Ranjan Maitra

Abstract

A separability index quantifying the degree of difficulty in a hard clustering problem is proposed under assumptions of a multivariate Gaussian distribution for each group. We first define a preliminary index and explore its properties both theoretically and numerically. Adjustments are then made to this index so that the final refinement is also interpretable in terms of the Adjusted Rand Index between a true grouping and its hypothetical idealized clustering, taken as a surrogate of clustering complexity. Our derived index is used to develop a data-simulation algorithm that generates samples according to the prescribed value of the index. This algorithm is particularly useful for systematically generating datasets with varying degrees of clustering difficulty which we use to evaluate performance of different clustering algorithms. The index is also shown to be useful in providing a summary of the distinctiveness of classes in grouped datasets.

DOI

https://doi.org/10.31274/etd-180810-1463

Copyright Owner

Anna Dagmar Peterson

Language

en

Date Available

2012-04-30

File Format

application/pdf

File Size

105 pages

Share

COinS