Campus Units

Statistics

Document Type

Article

Publication Version

Accepted Manuscript

Publication Date

2010

Journal or Book Title

Journal of Computational and Graphical Statistics

Volume

19

Issue

2

First Page

354

Last Page

376

DOI

10.1198/jcgs.2009.08054

Abstract

A new method is proposed to generate sample Gaussian mixture distributions according to prespecified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying prespecified overlap characteristics. The algorithm is illustrated in two and five dimensions using contour plots and parallel distribution plots, respectively, which we introduce and develop to display mixture distributions in higher dimensions. We also study properties of the algorithm and variability in the simulated mixtures. The utility of the suggested algorithm is demonstrated via a study of initialization strategies in Gaussian clustering. This article has supplementary material online.

Comments

This is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Computational and Graphical Statistics on January 2012, available online : http://www.tandf.com/10.1198/jcgs.2009.08054.

Copyright Owner

American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America

Language

en

File Format

application/pdf

Published Version

Share

COinS