Degree Type

Dissertation

Date of Award

2006

Degree Name

Doctor of Philosophy

Department

Mathematics

Major

Bioinformatics and Computational Biology

First Advisor

Daniel Ashlock

Second Advisor

Dan Voytas

Abstract

This dissertation presents a modular software pipeline that searches collections of RNA sequences for novel RNA motifs. In this case the motifs incorporate elements of primary and secondary structure. The motif search pipeline breaks up sets of RNA sequences into shortened segments of RNA primary sequence. The shortened segments are then folded to obtain low energy secondary structures. The distance estimation module of the pipeline then calculates distances between the folded bricks, and then analyzes the resulting distance matrices for patterns;An initial implementation of the pipeline is applied to synthetic and biological data sets. This implementation introduces a new distance measure for comparing RNA sequences based on structural annotation of the folded sequence as well as a new data analysis technique called non-linear projection. The modular nature of the pipeline is then used to explore the relationships between several different distance measures on random data, synthetic data, and a biological data set consisting of iron response elements. It is shown that the different distance measures capture different relationships between the RNA sequences. The non-linear projection algorithm is used to produce 2-dimensional projections of the distance matrices which are examined via inspection and k-means multiclustering. The pipeline is able to successfully cluster synthetic RNA sequences based only on primary sequence data as well as the iron response elements data set. The dissertation also presents a preliminary analysis of a large biological data set of HIV sequences.

DOI

https://doi.org/10.31274/rtd-180813-17348

Publisher

Digital Repository @ Iowa State University, http://lib.dr.iastate.edu

Copyright Owner

Justin Schonfeld

Language

en

Proquest ID

AAI3217314

File Format

application/pdf

File Size

123 pages

Share

COinS