Date of Award
Doctor of Philosophy
Electrical and Computer Engineering
This work is on the parallel de novo assembly of genomic sequences from short sequence reads. With short reads eliminating the reliability of read overlaps in predicting genomic co-location, a revival of graph-based methods has underpinned the development of short-read assemblers. While these methods predate short read technology, their reach has not extended significantly beyond bacterial genomes due to the memory resources required in their use. These memory limitations are exacerbated by the high coverage needed to compensate for shorter read lengths. As a result, prior to our work, short-read de novo assembly had been demonstrated on relatively small genome sizes with a few million bases. In our work, we advance the field of short sequence assembly in a number of ways. First, we extend models and ideas proposed and tested with small genomes on serial machines to large-scale distributed memory parallel machines. Second, we present ideas for assembly that are especially suited to the reconstruction of very large genomes on these machines. Additionally, we present the first assembler that specifically takes advantage a variable number of fragment sizes or insert lengths concurrently when making assembly decisions, while still working well for data with one insertion length.
Benjamin Grant Jackson
Jackson, Benjamin Grant, "Parallel methods for short read assembly" (2009). Graduate Theses and Dissertations. 10704.