Degree Type
Dissertation
Date of Award
2012
Degree Name
Doctor of Philosophy
Department
Genetics, Development and Cell Biology
Major
Bioinformatics and Computational Biology
First Advisor
Volker Brendel
Second Advisor
Vasant Honavar
Abstract
Next generation sequencing (NGS) approaches have become one of the most widely used tools in biotechnology. With high throughput sequencing, people can analyze non-model species at an unprecedented high resolution. NGS provides fast, deep and cheap sequencing solutions, and it has been used to answer various biological questions. In this thesis, I have developed a set of tools and used them to study several interesting research topics. First, de novo whole-genome assembly is still a very challenging technical task. For eukaryotic genomes, de novo assembly typically requires computational resources with very large memory and fast processors. Instead of trying to assemble the whole genome as done in previous approaches, I focus on efficiently reconstructing the genomic regions related to the homologous protein or cDNA sequences. I have developed SRAssembler, a local assembly program using the iterative chromosome walking strategy to assemble the loci of interest directly. Second, I used high-throughput RNA sequencing (refered to as RNA-Seq) data to analyze different intron splicing models and their relative frequency of occurrence. The first mechanism I explored is the recursive splicing patterns in large introns. I have implemented a pipeline called RSSFinder, which can search for recursive sites confirmed by RNA-Seq data. My study suggests the prevalence of recursive splicing in different species. These predicted recursive sites can also be used to investigate certain diseases associated with abnormal splicing of transcripts. In addition, I have demonstrated the use of RNA-Seq data to decipher the detailed mechanisms involved in splicing and their relationship with transcription. Here I proposed mathematical models to estimate the distribution of mRNA splicing intermediates. I evaluated my models with simulated data and an Arabidopsis thaliana dataset. My results indicate that co-transcriptional splicing is widespread in Arabidopsis thaliana.
DOI
https://doi.org/10.31274/etd-180810-1862
Copyright Owner
Hsien-chao Chou
Copyright Date
2012
Language
en
File Format
application/pdf
File Size
115 pages
Recommended Citation
Chou, Hsien-chao, "Local assembly and pre-mRNA splicing analyses by high-throughput sequencing data" (2012). Graduate Theses and Dissertations. 12819.
https://lib.dr.iastate.edu/etd/12819