Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Computer Science

First Advisor

David Fernandez-Baca


Despite the unprecedented outpouring of molecular sequence data in phylogenetics, the current understanding of the tree of life is still incomplete. The widespread applications of phylogenies, ranging from drug design to biodiversity conservation, repeatedly remind us of the need for more accurate and inclusive phylogenies. My thesis addresses some of the underlying challenges, by presenting theoretical and empirical results, as well as algorithms for a range of phylogenetic optimization problems.

In the first part of this thesis, I develop a heuristic method for the NP-hard unrooted Robinson-Foulds (RF) supertree problem, and show that it yields more accurate supertrees than those obtained from Matrix Representation with Parsimony (MRP) and rooted RF heuristic. In the second, I present an RF distance measure based approach (MulRF) for inferring a species tree from the input multi-copy gene trees, through a generalization of RF distance to multi-labeled trees. Through simulation, I show that this approach, which is independent of gene tree discordance mechanisms, produces more accurate species trees than existing methods when incongruence is caused by gene tree error, duplications and losses, and/or lateral gene transfer. Next, I perform a simulation study to evaluate the performance of Gene Tree Parsimony (GTP) under duplication and duplication and loss cost models and compare it to MulRF method. The objective is to study the effects of various types of sampling (e.g., gene tree and sequence sampling), gene tree error, and duplication and loss rates on the accuracy of the phylogenetic estimates by GTP and MulRF. Next, I present efficient error correction algorithms for gene tree reconciliation based on duplication, duplication and loss, and deep coalescence. In the end, I present NP-completeness proofs for two problems whose complexity was previously unknown.


Copyright Owner

Ruchi Chaudhary



File Format


File Size

119 pages