Date of Award
Doctor of Philosophy
Genetics, Development and Cell Biology
Bioinformatics and Computational Biology
Next generation sequencing (NGS) approaches have become one of the most widely used tools in biotechnology. With high throughput sequencing, people can analyze non-model species at an unprecedented high resolution. NGS provides fast, deep and cheap sequencing solutions, and it has been used to answer various biological questions. In this thesis, I have developed a set of tools and used them to study several interesting research topics. First, de novo whole-genome assembly is still a very challenging technical task. For eukaryotic genomes, de novo assembly typically requires computational resources with very large memory and fast processors. Instead of trying to assemble the whole genome as done in previous approaches, I focus on efficiently reconstructing the genomic regions related to the homologous protein or cDNA sequences. I have developed SRAssembler, a local assembly program using the iterative chromosome walking strategy to assemble the loci of interest directly. Second, I used high-throughput RNA sequencing (refered to as RNA-Seq) data to analyze different intron splicing models and their relative frequency of occurrence. The first mechanism I explored is the recursive splicing patterns in large introns. I have implemented a pipeline called RSSFinder, which can search for recursive sites confirmed by RNA-Seq data. My study suggests the prevalence of recursive splicing in different species. These predicted recursive sites can also be used to investigate certain diseases associated with abnormal splicing of transcripts. In addition, I have demonstrated the use of RNA-Seq data to decipher the detailed mechanisms involved in splicing and their relationship with transcription. Here I proposed mathematical models to estimate the distribution of mRNA splicing intermediates. I evaluated my models with simulated data and an Arabidopsis thaliana dataset. My results indicate that co-transcriptional splicing is widespread in Arabidopsis thaliana.
Chou, Hsien-chao, "Local assembly and pre-mRNA splicing analyses by high-throughput sequencing data" (2012). Graduate Theses and Dissertations. 12819.