Genex: a conditional independence based hybrid model for the analysis of gene expression data

Thumbnail Image
Date
2006-01-01
Authors
Dudala, Kalyan
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Genetics
Abstract

Gene expression microarrays have resulted in a vast pool of data which is still not being utilized to its full potential. While current methods allow for considerable reliability in measuring the change in a gene's expression in response to a set of conditions, relationships between genes are usually avoided due to the high dimensionality associated with this data type. Broadly speaking, there are two major types of exploratory analyses conducted on such relationships. The first is the category of exploratory clustering algorithms. Pioneered by Michael Eisen in 1998, this includes the software Cluster that performs a hierarchical clustering analysis on the basis of pair-wise correlations. While useful due to its ease of interpretation and user friendly software, Cluster does not take higher order relationships into account and as a result can be potentially misleading. The second category is that of network models. Commonly used models are Bayesian networks and several types of Gaussian models. Network models take higher order relationships into account and, in general, improve the signal to noise ratio. The potential drawback is the complexity of visual representation, making interpretation extremely difficult. Since the results are not forced into dendrogram structure, but are represented as points in multivariate space, it can be extremely challenging to draw useful inferences in the absence of explicit a-priori information. We build a hybrid model that attempts to combine the key features of both types of approaches. We construct a hierarchical dendrogram from a conditional independence network model, facilitating the same ease of interpretation inherent of clustering algorithms while preserving the benefits of a network model, namely the consideration of higher order relationships and the improvement of the signal to noise ratio. Presently limited to datasets of about 500 genes, the approach is probably most useful for smaller microarrays conducted after a key set of significantly expressed genes have been identified from a genome wide microarray experiment.

Comments
Description
Keywords
Citation
Source
Copyright
Sun Jan 01 00:00:00 UTC 2006