Date of Award
Doctor of Philosophy
Genetics, Development and Cell Biology
Bioinformatics and Computational Biology
Protein-RNA interactions are essential for many important processes including all phases of protein production, regulation of gene expression, and replication and assembly of many viruses. This dissertation has two related goals: 1) predicting RNA-binding sites in proteins from protein sequence, structure, and conservation information, and 2) characterizing protein-RNA interactions.
We present several machine learning classifiers for predicting RNA-binding sites in proteins based on the protein sequence, protein structure, and conservation information. Our first classifier uses only amino acid sequence information as input and predicts RNA-binding sites with an area under the receiver operator characteristic curve (AUC) of 0.74. Using the neighboring amino acids in the protein structure improves prediction performance over using sequence alone. We show that using evolutionary information in the form of position specific scoring matrices provides a further significant improvement in predictions. Finally, we create an ensemble classifier that combines the predictions of the sequence, structure, and PSSM based classifiers and gives the best prediction performance, with an AUC of 0.81.
We construct the Protein-RNA Interaction Database, PRIDB, a comprehensive collection of all protein-RNA complexes in the PDB. PRIDB focuses on characterizing the molecular interaction at the protein-RNA interface in terms of van der Waals contacts, direct hydrogen bonds, and water-mediated hydrogen bonds. We perform an extensive analysis of the RNA-binding characteristics of a non-redundant dataset of 181 proteins to determine general characteristics of protein-RNA binding sites. We find that the overall interaction propensities for Watson-Crick paired nucleotides and non Watson-Crick paired nucleotides are very similar, with the propensities for amino acids binding to single stranded nucleotides showing more differences. We find that van der Waals contacts are more numerous than hydrogen bonds and amino acids interact with RNA through their side chain atoms more frequently than their main chain atoms. We also find that contacts to the RNA base are not as frequent as contacts to the RNA backbone.
Together, the prediction and characterization presented in this dissertation have increased our understanding of how proteins and RNA interact.
Terribilini, Michael, "Computational analysis and prediction of protein-RNA interactions" (2008). Graduate Theses and Dissertations. 11688.