Date of Award
Doctor of Philosophy
The growing genomic and phylogenetic data sets represent a unique opportunity to analytically and computationally study the relationship among diversifying species. Unfortunately, such data often result in contradictory gene phylogenies due to common yet unobserved evolutionary events, e.g., gene duplication or deep coalescence. Gene tree parsimony (GTP) methods address such issue by reconciling gene phylogenies into one consistent species evolutionary history as well as identifying the underlying events. In this study, we solve not only the GTP problem but also propose a new method to select gene trees in order to assist biologists in gaining insight from phylogenetic analysis.
First, we introduce exact solutions for the intrinsically complex GTP problem. Exact solutions for NP-hard problems, like GTP, have a long and extensive history of improvements for classic problems such as traveling salesman and knapsack. Our solutions presented here are designed via integer linear programming (ILP) and dynamic programming (DP), which are techniques widely used in solving problems of similar complexity. We also demonstrate the effectiveness of our solutions through simulation analysis and empirical datasets.
To ensure input data coherence for GTP analysis, as a method to strengthen species represented in a gene tree, we introduce the quasi-biclique (QBC) approach to analyze and condense input datasets. In order to take advantage of emerging techniques that further describe the sequence-host and gene-taxon relations, quasi-bicliques are optimized via weighted edge connectivities and distribution of missing information. Our study showed these QBC mining problems are NP-hard. We describe an ILP formulation that is capable of finding optimal QBCs in an effort to support GTP analysis. We also investigate the applicability of QBC to other applications such as mining genetic interaction networks to encouraging results.
Chang, Wen-Chieh, "Phylogeny reconciliation under gene tree parsimony" (2012). Graduate Theses and Dissertations. 12857.