Journal or Book Title
Nucleic Acids Research
We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP.
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact email@example.com
Huang, Xiaoqiu; Yang, Shiaw-Pyng; Chinwalla, Asif T.; Hillier, LaDeana W.; Minx, Patrick; Mardis, Elaine R.; and Wilson, Richard K., "Application of a superword array in genome assembly" (2006). Computer Science Publications. 29.