Technical Report Number
The problem of recognizing motifs from biological data has been well-studied and numerous algorithms, both exact and approximate, have been proposed to address the underlying issue. We strongly believe that open availability and ease of accessibility of quality implementations for such algorithms are critical to the research community, in order to directly reproduce and utilize the results from other studies, so as not to reinvent the wheel. Moreover, it is also important for the implementation to be as generic as possible so that any researcher can to extend it with minimal effort to test a newly implemented algorithmic extension or heuristic. With this motivation, we choose to focus an existing algorithm, PatternBranching and, to a lesser degree, Yang2004. We analyze these approaches for minor heuristical changes & speed-ups by adjusting certain thresholds, and finally, implement the variant in high-level language (Java) using thought through programming practices and generic, extensible interfaces. We also analyze the performance of PatternBranching using a synthetically generated test-suite for a variety of sequence lengths and report the results. Code from this project will be made freely available online to the research community.