Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Computer Science


Computer Science

First Advisor

Oliver Eulenstein


A complex system is a group or organization in which many components interact to perform functions that a single component cannot. The study of the evolution of complex systems can reveal the pattern of interactions, the dependencies between entities, the impacts on the system’s environment, etc. In bacterial genomes, a gene block can be thought of as a complex system. Gene blocks are genes that are co-located on chromosomes, whose evolutionary conservation is essential and useful in phylogenetic and functional studies. Conservation of gene blocks across several species indicates that some of the gene blocks are operons, which are gene blocks that are transcribed together into an mRNA under the control of a single promoter. Previously, an event- based model was introduced to study the evolution of gene blocks in bacteria. It was proven to be practical and efficient in examining the formation of gene block. In this dissertation research, I present three studies on the evolution of gene blocks in bacteria using the event-based model.In the first study, we introduced the formalization of the event-based model and used it to study the problem of finding orthologous gene blocks in bacteria. Unfortunately, the problem is NP-Hard, which means it is unlikely to have an efficient algorithm to solve the problem exactly. Additionally, it is hard to approximate the problem within a ratio of log2(n)/2 ≃ .72 ln n where n is the number of genes in the reference gene blocks. Fortunately, it is possible to approximate it within a ratio ln n using a variant of the greedy algorithm of the Set Cover problem. To evaluate the optimality of this method, we formulated the problem as an Integer Linear Program. Using a data set from my previous paper, we compared my method with the state-of-the-art method and showed that mine was more efficient and optimal. In the second study, we explored the evolution of the homologous gene blocks under the event- based model. Using the maximum parsimony approach, we developed two heuristic algorithms to reconstruct the ancestral state of gene blocks of a group of taxa given a reference gene block. We applied these two methods to study the evolution of 55 operons in Escherichia coli and 83 operons in Bacillus subtilis. From the results, we discovered several plausible intermediate states of gene blocks which might be fully or partially functional. From the reconstruction of the ancestral states of gene block, we formed hypotheses on how the gene blocks evolved, and what external forces could have impacted on the course of the evolution. In the third study, we extended the heuristic algorithm from the second study to trace the evolution of bacterial gibberellin biosynthetic operon. Gibberellin hormones are ubiquitous regulators of growth and developmental processes in vascular plants. Although homologous operons encode GA biosynthetic enzymes in both rhizobia and phytopathogens, there is a clear functional distinction between them. Using the reconstruction algorithm, we characterized losses, gains, and duplications of the gibberellin operon genes within alpha-proteobacteria rhizobia and revealed a complex history of horizontal gene transfer of individual genes and clusters of genes. In conclusion, the event-based model provides a powerful tool to study the evolution of gene blocks in bacteria. Although the problems are NP-hard, heuristic approaches can provide meaningful results within a reasonable time. In the future work section, I present directions to extend our studies, as well as problems they might solve.


Copyright Owner

Huy Nguyen



File Format


File Size

111 pages