Computational analyses to identify and study cis-regulatory regions in eukaryotes

Thumbnail Image
Date
2012-01-01
Authors
Sridharan, Krishnakumar
Major Professor
Advisor
Volker P. Brendel
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Genetics, Development and Cell Biology

The Department of Genetics, Development, and Cell Biology seeks to teach subcellular and cellular processes, genome dynamics, cell structure and function, and molecular mechanisms of development, in so doing offering a Major in Biology and a Major in Genetics.

History
The Department of Genetics, Development, and Cell Biology was founded in 2005.

Related Units

Journal Issue
Is Version Of
Versions
Series
Department
Genetics, Development and Cell Biology
Abstract

Transcription is a vital and complicated process that is the first step leading to gene expression in eukaryotes. The initiation of transcription is controlled by a variety of regulatory elements which are found in the promoter regions of genes. The computational study of these cis-regulatory regions, and their mechanisms, is of great interest for large-scale studies on gene regulation and expression. With the rapidly rising availability of genomes, accurate computational tools are needed to identify and annotate promoter regions from nucleotide sequences. In addition, genome-wide experimental projects to study the transcriptional landscape have produced comprehensive transcript datasets such as Cap Analysis of Gene Expression (CAGE) and full length cDNA (fl-cDNA). These datasets can also be mined to derive actionable knowledge on promoter regions.

Previous efforts towards computationally identifying promoter regions are heavily biased, both in quality and quantity, towards mammalian and insect genomes. The few tools that identify promoter regions in plants are either trained on data from a different kingdom, or are over-simplistic without utilizing the advances in mammalian promoter region prediction. There is also an urgent need for tools that can mine pre-existing transcript datasets to derive hypotheses about the complex transcriptional landscapes of eukaryotes.

In this thesis, I have designed two computational tools that can greatly aid studies on cis-regulatory regions of eukaryotes. In order to identify promoter regions from nucleotide sequences, I have designed the Promoter Prediction Extractor (ProPEr) tool. This machine learning-based tool is robust and powerful in identifying promoter regions from varying sizes of plant DNA sequences, and is of specific value for relatively less-studied or newly-sequenced species.

To analyze and utilize previously produced datasets from public and private 5' profiling studies, we have designed TSRchitect. TSRchitect is an accurate tool that utilizes transcript datasets such as CAGE and EST/fl-cDNA to identify promoter regions. TSRchitect is capable of identifying alternative or tissue-specific promoter usage, and shows great potential in comparative studies of regulatory regions across eukaryotes.

ProPEr and TSRchitect can, by themselves or as part of a larger annotation framework, expand our knowledge about the promoter regions of both newly-sequenced and model eukaryotic species.

Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright
Sun Jan 01 00:00:00 UTC 2012