Constructing liberal and conservative supertrees and exact solutions for reduced consensus problems

Dong, Jianrong

Constructing liberal and conservative supertrees and exact solutions for reduced consensus problems

File

Dong_iastate_0097E_12537.pdf (672.01 KB)

Date

2012-01-01

Authors

Dong, Jianrong

Advisor

David Fernandez-Baca

Altmetrics

Organizational Units

Organizational Unit

Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

History
The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence
1969-present

Related Units

College of Liberal Arts and Sciences (parent college)

Department

Computer Science

Abstract

This thesis studies two different approaches to extracting information from collections of phylogenetic trees: supertrees and reduced consensus. Supertree methods combine the phylogenetic information from multiple partially-overlapping trees into a larger phylogenetic tree called a supertree. Several supertree construction methods have been proposed to date, but most of these are not designed with any specific properties in mind. Recently, Cotton and Wilkinson proposed extensions of the majority-rule consensus tree method to the supertree setting that inherit many of the appealing properties of the former.

We study a variant of one of Cotton and Wilkinson's methods, called majority-rule (+) supertrees. After proving that a key underlying problem for constructing majority-rule (+) supertrees is NP-hard, we develop a polynomial-size exact integer linear programming formulation of the problem. We then present a data reduction heuristic that identifies smaller subproblems that can be solved independently. While this technique is not guaranteed to produce optimal solutions, it can achieve substantial problem-size reduction. Finally, we report on a computational study of our approach on various real data sets, including the 121-taxon, 7-tree Seabirds data set of Kennedy and Page. The results indicate that our exact method is computationally feasible for moderately large inputs. For larger inputs, our data reduction heuristic makes it feasible to tackle problems that are well beyond the range of the basic integer programming approach. Comparisons between the results obtained by our heuristic and exact solutions indicate that the heuristic produces good answers. Our results also suggest that the majority-rule (+) approach, in both its basic form and with data reduction, yields biologically meaningful phylogenies.

Generalizations of the strict and loose consensus methods to the supertree setting, recently introduced by McMorris and Wilkinson, are studied. The supertrees these methods produce are conservative in the sense that they only preserve information (in the form of splits) that is supported by at least one the input trees and that is not contradicted by any of the input trees. Alternative, equivalent, formulations of these supertrees are developed. These are used to prove the NP-completeness of the underlying optimization problems and to give exact integer linear programming solutions. For larger data sets, a divide and conquer approach is adopted, based on the structural properties of these supertrees. Experiments show that it is feasible to solve problems with several hundred taxa and several hundred trees in a reasonable amount of time.

A rogue taxon in a collection of phylogenetic trees is one whose position varies drastically from tree to tree. The presence of such taxa can greatly reduce the resolution of the consensus tree (e.g., the majority-rule or strict consensus) for a collection. The reduced consensus approach aims to identify rogue taxa and to produce more informative consensus trees. Given a collection of phylogenetic trees over the same leaf set, the goal is to find a set of taxa whose removal maximizes the number of internal edges in the consensus tree of the collection. This problem is proven to be NP-hard for strict and majority-rule consensus. We describe exact integer linear programming formulations for computing reduced strict, majority and loose consensus trees. In experimental tests, our exact solutions show significant improvement over heuristic methods on several problem instances.

Copyright

Sun Jan 01 00:00:00 UTC 2012

Collections

Theses and Dissertations

Full item page