Technical Report Number
Computer Applications, Data, Software
The problem of mining collections of trees to identify common patterns, called frequent subtrees (FSTs), arises often when trying to make sense of the results of phylogenetic analysis. FST mining generalizes the well-known maximum agreement subtree problem. Here we present EvoMiner, a new algorithm for mining frequent subtrees in collections of phylogenetic trees. EvoMiner is an Apriori-like level-wise method, which uses novel phylogeny-specific constant-time candidate generation scheme, an efficient fingerprinting-based technique for downward closure operation, and a lowest common ancestor based support counting step that requires neither costly subtree operations nor database traversal. As a result of these techniques, our algorithm achieves speed-ups of up to 100 times or more over phylominer, another algorithm for mining phylogenetic trees. EvoMiner can also work in vertical mining mode, to use less memory at the expense of speed.