Campus Units

Computer Science, Electrical and Computer Engineering

Document Type

Article

Publication Version

Published Version

Publication Date

2003

Journal or Book Title

Genome Research

Volume

13

Issue

9

First Page

2164

Last Page

2170

DOI

10.1101/gr.1390403

Abstract

We describe a whole-genome assembly program named PCAP for processing tens of millions of reads. The PCAP program has several features to address efficiency and accuracy issues in assembly. Multiple processors are used to perform most time-consuming computations in assembly. A more sensitive method is used to avoid missing overlaps caused by sequencing errors. Repetitive regions of reads are detected on the basis of many overlaps with other reads, instead of many shorter word matches with other reads. Contaminated end regions of reads are identified and removed. Generation of a consensus sequence for a contig is based on an alignment of reads in the contig, in which both base quality values and coverage information are used to determine every consensus base. The PCAP program was tested on a mouse whole-genome data set of 30 million reads and a human Chromosome 20 data set of 1.7 million reads. The program is freely available for academic use.

Comments

This article is published as Huang, Xiaoqiu, Jianmin Wang, Srinivas Aluru, Shiaw-Pyng Yang, and LaDeana Hillier. "PCAP: a whole-genome assembly program." Genome research 13, no. 9 (2003): 2164-2170. doi: 10.1101/gr.1390403. Posted with permission.

Creative Commons License

Creative Commons Attribution-Noncommercial 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial 4.0 License

Copyright Owner

Cold Spring Harbor Laboratory Press

Language

en

File Format

application/pdf

Share

COinS