Technical Report Number
Predicting Protein-Protein Interaction Sites From Amino Acid Sequence Changhui Yan, Vasant Honavar and Drena Dobbs Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Graduate Program Iowa State University Ames, Iowa 50011 Corresponding author: Changhui Yan Email address of the corresponding author: firstname.lastname@example.org Abstract We describe an approach for computational prediction of protein-protein interaction sites using a support vector machine (SVM) classifier. Interface residues and other surface residues were extracted from 115 proteins derived from a set of 70 heterocomplexes in PDB. The SVM classifier was trained to predict whether or not a surface residue is located in the interface based on the identity of the target residue and its 10 sequence neighbors. The effectiveness of the approach was evaluated using 115 leave-one-out cross validation (jack-knife) experiments. In each experiment, an SVM classifier was trained using a set of 1250 randomly chosen interface residues and an equal number of non-interface residues from 114 of the 115 molecules. The resulting classifier was used to classify surface residues from the remaining molecule into interface and non-interface residues. The classifier in each experiment was evaluated in terms of several performance measures. In results averaged over 115 experiments, interface residues and non-interface residues were identified with relatively high specificity (71%) and sensitivity (67%), and with a correlation coefficient of 0.29 between predicted and actual class labels, indicating that the method performs substantially better than chance (zero correlation). We also investigated the classifier's performance in terms of overall interactions site recognition. In 80% of the proteins, the classifier recognized the interaction surface by identifying at least half of the interface residues, and in 98% of the proteins, at least 20% of the interface residues were correctly identified. The success of this approach was confirmed by examination of predicted interfaces in the context of the three-dimensional structures of representative complexes. This study demonstrates that an SVM classifier can be used to predict whether or not a surface residue is an interface residue using amino acid sequence information. Because surface residues can be identified based on their solvent accessible surface area (ASA), given recent progress in computational methods for predicting ASA from sequence, the approach described in this paper provides a basis for computational prediction of interaction sites in proteins for which only amino acid sequence information is available. Keywords: protein-protein interaction; interaction site prediction; interface residues; support vector machine.