Campus Units
Genetics, Development and Cell Biology, Bioinformatics and Computational Biology, Computer Science
Document Type
Article
Publication Version
Published Version
Publication Date
2011
Journal or Book Title
BMC Bioinformatics
Volume
12
First Page
489
DOI
10.1186/1471-2105-12-489
Abstract
Background
RNA-protein interactions (RPIs) play important roles in a wide variety of cellular processes, ranging from transcriptional and post-transcriptional regulation of gene expression to host defense against pathogens. High throughput experiments to identify RNA-protein interactions are beginning to provide valuable information about the complexity of RNA-protein interaction networks, but are expensive and time consuming. Hence, there is a need for reliable computational methods for predicting RNA-protein interactions.
Results
We propose RPISeq, a family of classifiers for predicting R NA-p rotein i nteractions using only seq uence information. Given the sequences of an RNA and a protein as input, RPIseq predicts whether or not the RNA-protein pair interact. The RNA sequence is encoded as a normalized vector of its ribonucleotide 4-mer composition, and the protein sequence is encoded as a normalized vector of its 3-mer composition, based on a 7-letter reduced alphabet representation. Two variants of RPISeq are presented: RPISeq-SVM, which uses a Support Vector Machine (SVM) classifier and RPISeq-RF, which uses a Random Forest classifier. On two non-redundant benchmark datasets extracted from the Protein-RNA Interface Database (PRIDB), RPISeq achieved an AUC (Area Under the Receiver Operating Characteristic (ROC) curve) of 0.96 and 0.92. On a third dataset containing only mRNA-protein interactions, the performance of RPISeq was competitive with that of a published method that requires information regarding many different features (e.g., mRNA half-life, GO annotations) of the putative RNA and protein partners. In addition, RPISeq classifiers trained using the PRIDB data correctly predicted the majority (57-99%) of non-coding RNA-protein interactions in NPInter-derived networks from E. coli, S. cerevisiae, D. melanogaster, M. musculus, and H. sapiens.
Conclusions
Our experiments with RPISeq demonstrate that RNA-protein interactions can be reliably predicted using only sequence-derived information. RPISeq offers an inexpensive method for computational construction of RNA-protein interaction networks, and should provide useful insights into the function of non-coding RNAs. RPISeq is freely available as a web-based server at http://pridb.gdcb.iastate.edu/RPISeq/.
Rights
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright Owner
Muppirala et al
Copyright Date
2011
Language
en
File Format
application/pdf
Recommended Citation
Muppirala, Usha; Honavar, Vasant G.; and Dobbs, Drena, "Predicting RNA-Protein Interactions Using Only Sequence Information" (2011). Genetics, Development and Cell Biology Publications. 95.
https://lib.dr.iastate.edu/gdcb_las_pubs/95
Included in
Bioinformatics Commons, Computational Biology Commons, Genetics Commons, Genomics Commons
Comments
This article is from BMC Bioinformatics 12 (2011): 489, doi: 10.1186/1471-2105-12-489. Posted with permission.