Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Computer Science


Computer Science

First Advisor

Xiaoqiu Huang


Construction of a phylogenetic tree for a number of species from their genome sequence is very important for understanding the evolutionary history of those species. Rapid improvements in DNA sequencing technology have generated sequence data for huge number of similar isolates with a wide range of single nucleotide polymorphism (SNP) rates, where the SNP rate among some isolates can be thousands of times lower than the others. This kind of genome sequences are difficult for the existing methods because the subtree(s) (or clade) consisting of species or isolates with very low SNP rates may have a very low level of resolution and their evolutionary history may not be accurately represented. Identification of the informative columns in the alignment containing important variations in the genome of those species is important in constructing their evolutionary history. Here we describe a method for selecting informative regions for a set of isolates based on the observation that the likelihood of informative columns are sensitive to changes in the tree topology. We show that these informative columns increase the correctness of the phylogenies constructed for the closely related isolates. Then we address the generalized version of this problem by developing a hierarchical approach to phylogeny construction. In this method, the construction is performed at multiple levels, where at each level, groups of isolates with similar levels of similarity are identified and their phylogenetic trees are constructed. We also detect those multiple levels of similarity in an automated manner. Our results show that this new hierarchical approach is much efficient and sometimes more accurate than existing approaches of building the phylogenetic tree with maximum likelihood from the whole alignment for all the isolates.

Copyright Owner

Anindya Das



File Format


File Size

96 pages