Degree Type

Thesis

Date of Award

2010

Degree Name

Master of Science

Department

Computer Science

First Advisor

Jin Tian

Abstract

Bayesian networks are being widely used in various data mining tasks for probabilistic inference and causual modeling [Pearl (2000), Spirtes et al. (2001)]. Learning the best Bayesian network structure is known to be NP-hard [Chickering (1996)]. Also, learning the single best Bayesian network structure does not always give a good approximation of the actual underlying structure. This is because in many domains, the number of high-scoring models is usually large.

In this thesis, we propose that learning the top-k Bayesian network structures and model averaging over these k networks gives a better approximation of the underlying model. The posterior probability of any hypotheses of interest is computed by averaging over the top-k Bayesian network models. The proposed techniques are applied to flow cytometric data to make causal inferences in human cellular signaling networks. The causal inferences made about the human T-cell protein signaling model by this method is compared with inferences made by various other learning techniques which were proposed earlier [Sachs et al. (2005)]. We also study and compare the classication accuracy of the top-k networks to that of the single MAP network.

In summary, this thesis describes:

1. Algorithm for learning the top-k Bayesian network structures.

2. Model averaging based on the top-k networks.

3. Experimental results on the posterior probabilities of the top-k networks.

4. How the top-k Bayesian networks can be applied to learn protein signaling networks with Results of top-k model averaging on the CYTO data.

5. Results of Classication Accuracy of the top-k networks.

Copyright Owner

Lavanya Ram

Language

en

Date Available

2012-04-30

File Format

application/pdf

File Size

55 pages

Share

COinS