Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Computer Science


Bioinformatics and Computational Biology

First Advisor

Vasant Honavar


Protein-protein interaction plays a pivotal role in biological metabolism. It directs many cellular processes like signal transduction, DNA replication and RNA splicing, etc. Identification of protein-protein interaction sites is important to identification of protein functions, improvement of protein-protein docking and rational drug design. Experimental methods to identify protein-protein interaction sites are always time-consuming and costly, which calls for computational methods to be applied in this area.

The research work focuses on three parts:

We have built a Protein-Protein Interface Database (PPIDB) which extracted 71, 486 protein-protein interfaces from experimentally determined protein complex structures in the current version of Protein Data Bank. It facilitates construction of well-characterized datasets of protein-protein interface residues for computational analyses. The database is accessible through the Web Interface and a set of Web services.

We have made a comprehensive analysis of protein-protein dimeric interfaces, which consists of thirteen physic-chemical properties. The results disclose that interface residues have side chains pointing inward; interfaces are rougher, tend to be flat, moderately convex or concave and protrude more relative to non-interface surface residues; interface residues tend to be surrounded by hydrophobic neighbors.

We have developed NB PPIPS, a Naive Bayes method to predict protein-protein interaction sites on protein surfaces. Trained over a non-redundant data set consisting of 2, 383 proteins and fed with sequence, evolutionary and structural properties, NB PPIPS achieves 60.7% recall and 34.6% precision in 10 fold cross-validation, which greatly improves over the baseline classifier that only utilizes protein sequence information. Attempts are made to apply the NB PPIPS in a two stage prediction of protein-protein interfaces when only protein sequence is known. Modeled protein structures are generated via homologue modeling and fed as inputs into NB PPIPS. The results show that good predictions are obtained only for well modeled structures. NB PPIPS is implemented as an online server to facilitate its usage. It is accessible at .


Copyright Owner

Feihong Wu



Date Available


File Format


File Size

121 pages