Date of Award
Master of Science
Bayesian networks are being widely used in various data mining tasks for probabilistic inference and causual modeling [Pearl (2000), Spirtes et al. (2001)]. Learning the best Bayesian network structure is known to be NP-hard [Chickering (1996)]. Also, learning the single best Bayesian network structure does not always give a good approximation of the actual underlying structure. This is because in many domains, the number of high-scoring models is usually large.
In this thesis, we propose that learning the top-k Bayesian network structures and model averaging over these k networks gives a better approximation of the underlying model. The posterior probability of any hypotheses of interest is computed by averaging over the top-k Bayesian network models. The proposed techniques are applied to flow cytometric data to make causal inferences in human cellular signaling networks. The causal inferences made about the human T-cell protein signaling model by this method is compared with inferences made by various other learning techniques which were proposed earlier [Sachs et al. (2005)]. We also study and compare the classication accuracy of the top-k networks to that of the single MAP network.
In summary, this thesis describes:
1. Algorithm for learning the top-k Bayesian network structures.
2. Model averaging based on the top-k networks.
3. Experimental results on the posterior probabilities of the top-k networks.
4. How the top-k Bayesian networks can be applied to learn protein signaling networks with Results of top-k model averaging on the CYTO data.
5. Results of Classication Accuracy of the top-k networks.
Ram, Lavanya, "Bayesian model averaging using k-best bayesian network structures" (2010). Graduate Theses and Dissertations. 11879.