Predicting Protein-Protein Interaction Sites From Amino Acid Sequence

Thumbnail Image
Date
2002-10-01
Authors
Yan, Changhui
Honavar, Vasant
Dobbs, Drena
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Dobbs, Drena
University Professor Emeritus
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of
Versions
Series
Department
Computer Science
Abstract

Predicting Protein-Protein Interaction Sites From Amino Acid Sequence Changhui Yan, Vasant Honavar and Drena Dobbs Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Iowa State University Ames, Iowa 50011 Corresponding author: Changhui Yan Email address of the corresponding author: chhyan@iastate.edu Abstract We describe an approach for computational prediction of protein-protein interaction sites using a support vector machine (SVM) classifier. Interface residues and other surface residues were extracted from 115 proteins derived from a set of 70 heterocomplexes in PDB. The SVM classifier was trained to predict whether or not a surface residue is located in the interface based on the identity of the target residue and its 10 sequence neighbors. The effectiveness of the approach was evaluated using 115 leave-one-out cross validation (jack-knife) experiments. In each experiment, an SVM classifier was trained using a set of 1250 randomly chosen interface residues and an equal number of non-interface residues from 114 of the 115 molecules. The resulting classifier was used to classify surface residues from the remaining molecule into interface and non-interface residues. The classifier in each experiment was evaluated in terms of several performance measures. In results averaged over 115 experiments, interface residues and non-interface residues were identified with relatively high specificity (71%) and sensitivity (67%), and with a correlation coefficient of 0.29 between predicted and actual class labels, indicating that the method performs substantially better than chance (zero correlation). We also investigated the classifier's performance in terms of overall interactions site recognition. In 80% of the proteins, the classifier recognized the interaction surface by identifying at least half of the interface residues, and in 98% of the proteins, at least 20% of the interface residues were correctly identified. The success of this approach was confirmed by examination of predicted interfaces in the context of the three-dimensional structures of representative complexes. This study demonstrates that an SVM classifier can be used to predict whether or not a surface residue is an interface residue using amino acid sequence information. Because surface residues can be identified based on their solvent accessible surface area (ASA), given recent progress in computational methods for predicting ASA from sequence, the approach described in this paper provides a basis for computational prediction of interaction sites in proteins for which only amino acid sequence information is available. Keywords: protein-protein interaction; interaction site prediction; interface residues; support vector machine.

Comments
Description
Keywords
Citation
DOI
Source
Copyright
Collections