Degree Type

Dissertation

Date of Award

2007

Degree Name

Doctor of Philosophy

Department

Theses & dissertations (Interdisciplinary)

Major

Bioinformatics and Computational Biology

First Advisor

Patrick S. Schnable

Second Advisor

Daniel A. Ashlock

Abstract

Motivation. Clustering has become an integral part of microarray data analysis and interpretation. It is helpful to reduce the scale of information generated by microarray experiment to the level that biologists can generate hypothesis. There is a danger that artifacts induced by clustering methods can cause misinterpretation of the data. Clustering method that can accurately capture the natural structure of the data would be a useful tool for biologists to discovery the biological meaning buried in the data. To this end, a new clustering algorithm, called K-means multiclustering, is introduced. The method can avoid the artifacts induced by distance or similarity metrics by amalgamating the results of many K-means clusterings. Results. The multiclustering algorithm is a model-free clustering method. It is found to be reliable and consist in capturing the underlying data structure with high accuracy that is competitive with model based clustering and superior to other methods on synthetic micorarry data generated in a manner consistent with the hypothesis of model based clustering. The algorithm has a high level of immunity to artifacts introduced by the metric used to measure the distance between data points. It can successfully cluster data sets which are designed to have different shapes and variation and cannot be correctly clustered by traditional clustering method. The cut plot computed by this method is a very simple and useful summary of the data structure. A detailed view of the formation of clustering can also be generated by the method to reveal the underlying hierarchical structure of data set.

DOI

https://doi.org/10.31274/rtd-180813-17131

Publisher

Digital Repository @ Iowa State University, http://lib.dr.iastate.edu/

Copyright Owner

Ling Guo

Language

en

Proquest ID

AAI3274875

OCLC Number

183874744

ISBN

9780549154662

File Format

application/pdf

File Size

126 pages

Share

COinS