Computational studies with ESTs: assembly, SNP detection, and applications in alternative splicing

Wang, Jianmin

Computational studies with ESTs: assembly, SNP detection, and applications in alternative splicing

File

r_3229132.pdf (2.04 MB)

Date

2006-01-01

Authors

Wang, Jianmin

Advisor

Xiaoqiu Huang

Xun Gu

Altmetrics

Organizational Units

Organizational Unit

Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

History
The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence
1969-present

Related Units

College of Liberal Arts and Sciences (parent college)

Department

Computer Science

Abstract

EST sequences are important in functional genomics studies. To better use available EST resources, clustering and assembling are crucial techniques. For EST sequences with deep coverage, no current assembly program can handle them well. We describe a deep assembly program named DA. The program keeps the number of differences in each contig alignment under control by making corrections to differences that are likely due to sequencing errors. Experimental results on the 115 clusters from the UniGene database show that DA can handle data sets of deep coverage efficiently. A comparison of the DA consensus sequences with the finished human and mouse genomes indicates that the consensus sequences are of acceptable quality;EST sequences can be used in SNP discovery. We describe a computational method for finding common SNPs with allele frequencies in single-pass sequences of deep coverage. The method enhances a widely used program named PolyBayes in several aspects. We present results from our method and PolyBayes on eighteen data sets of human expressed sequence tags (ESTs) with deep coverage. The results indicate that our method used almost all single-pass sequences in computation of the allele frequencies of SNPs;EST sequences can also be used to study alternative splicing (AS), which is the most common post transcription event in metazoans. We first developed a pipeline to identify AS forms by comparing alignments between expressed sequences and genomic sequences. Then we studied the relationship between AS and gene duplication. We observed that duplicate genes have fewer AS forms than single-copy genes; we also found that the loss of alternative splicing in duplicate genes may occur shortly after the gene duplication. Further analysis of the alternative splicing distribution in human duplicate pairs showed the asymmetric evolution of alternative splicing after gene duplications. We also compared AS among six species. We found significant differences on both AS rates and splice forms per gene among the studied species by detailed and categorized studies. The difference in AS rate between rice and Arabidopsis is significant enough to lead to a difference in protein diversity between those two species.

Copyright

Sun Jan 01 00:00:00 UTC 2006

Collections

Theses and Dissertations

Full item page