Parallel methods for short read assembly

Thumbnail Image
Date
2009-01-01
Authors
Jackson, Benjamin
Major Professor
Advisor
Srinivas Aluru
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Electrical and Computer Engineering
Abstract

This work is on the parallel de novo assembly of genomic sequences from short sequence reads. With short reads eliminating the reliability of read overlaps in predicting genomic co-location, a revival of graph-based methods has underpinned the development of short-read assemblers. While these methods predate short read technology, their reach has not extended significantly beyond bacterial genomes due to the memory resources required in their use. These memory limitations are exacerbated by the high coverage needed to compensate for shorter read lengths. As a result, prior to our work, short-read de novo assembly had been demonstrated on relatively small genome sizes with a few million bases. In our work, we advance the field of short sequence assembly in a number of ways. First, we extend models and ideas proposed and tested with small genomes on serial machines to large-scale distributed memory parallel machines. Second, we present ideas for assembly that are especially suited to the reconstruction of very large genomes on these machines. Additionally, we present the first assembler that specifically takes advantage a variable number of fragment sizes or insert lengths concurrently when making assembly decisions, while still working well for data with one insertion length.

Comments
Description
Keywords
Citation
Source
Copyright
Thu Jan 01 00:00:00 UTC 2009