Computational analyses to identify and study cis-regulatory regions in eukaryotes

Sridharan, Krishnakumar

Computational analyses to identify and study cis-regulatory regions in eukaryotes

File

Sridharan_iastate_0097E_13194.pdf (1.96 MB)

Date

2012-01-01

Authors

Sridharan, Krishnakumar

Advisor

Volker P. Brendel

Altmetrics

Organizational Units

Organizational Unit

Genetics, Development and Cell Biology

The Department of Genetics, Development, and Cell Biology seeks to teach subcellular and cellular processes, genome dynamics, cell structure and function, and molecular mechanisms of development, in so doing offering a Major in Biology and a Major in Genetics.

History
The Department of Genetics, Development, and Cell Biology was founded in 2005.

Related Units

College of Agriculture and Life Sciences (parent college)
College of Liberal Arts and Sciences (parent college)

Department

Genetics, Development and Cell Biology

Abstract

Transcription is a vital and complicated process that is the first step leading to gene expression in eukaryotes. The initiation of transcription is controlled by a variety of regulatory elements which are found in the promoter regions of genes. The computational study of these cis-regulatory regions, and their mechanisms, is of great interest for large-scale studies on gene regulation and expression. With the rapidly rising availability of genomes, accurate computational tools are needed to identify and annotate promoter regions from nucleotide sequences. In addition, genome-wide experimental projects to study the transcriptional landscape have produced comprehensive transcript datasets such as Cap Analysis of Gene Expression (CAGE) and full length cDNA (fl-cDNA). These datasets can also be mined to derive actionable knowledge on promoter regions.

Previous efforts towards computationally identifying promoter regions are heavily biased, both in quality and quantity, towards mammalian and insect genomes. The few tools that identify promoter regions in plants are either trained on data from a different kingdom, or are over-simplistic without utilizing the advances in mammalian promoter region prediction. There is also an urgent need for tools that can mine pre-existing transcript datasets to derive hypotheses about the complex transcriptional landscapes of eukaryotes.

In this thesis, I have designed two computational tools that can greatly aid studies on cis-regulatory regions of eukaryotes. In order to identify promoter regions from nucleotide sequences, I have designed the Promoter Prediction Extractor (ProPEr) tool. This machine learning-based tool is robust and powerful in identifying promoter regions from varying sizes of plant DNA sequences, and is of specific value for relatively less-studied or newly-sequenced species.

To analyze and utilize previously produced datasets from public and private 5' profiling studies, we have designed TSRchitect. TSRchitect is an accurate tool that utilizes transcript datasets such as CAGE and EST/fl-cDNA to identify promoter regions. TSRchitect is capable of identifying alternative or tissue-specific promoter usage, and shows great potential in comparative studies of regulatory regions across eukaryotes.

ProPEr and TSRchitect can, by themselves or as part of a larger annotation framework, expand our knowledge about the promoter regions of both newly-sequenced and model eukaryotic species.

Copyright

Sun Jan 01 00:00:00 UTC 2012

Collections

Theses and Dissertations

Full item page