Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Computer Science


Bioinformatics and Computational Biology

First Advisor

Vasant Honavar

Second Advisor

Drena Dobbs


Protein-protein interactions play a central role in the formation of protein complexes and the biological pathways that orchestrate virtually all cellular processes. Three dimensional structures of a complex formed by a protein with one or more of its interaction partners provide useful information regarding the specific amino acid residues that make up the interface between proteins. The emergence of high throughput techniques such as Yeast 2 Hybrid (Y2H) assays has made it possible to identify putative interactions between thousands of proteins (but not the interfaces that form the structural basis of interactions or the structures of protein complexes that result from such interactions). Reliable identification of the specific amino acid residues that form the interface of a protein with one or more other proteins is critical for understanding the structural and physico-chemical basis of protein interactions and their role in key cellular processes, for predicting protein complexes, for validating protein interactions predicted by high throughput methods, for ranking conformations of protein complexes generated by docking, and for identifying and prioritizing drug targets in computational drug design.

However, given the high cost of experimental determination of the structures of protein complexes, there is an urgent need for reliable and fast computational methods for identifying interface residues and/or predicting the structure of a complex formed by a protein of interest with its interaction partners. Given the large and growing gap between the number of known protein sequences and the number of experimentally determined structures, sequence-based methods for predicting protein-protein interfaces are of particular interest. Against this background, we develop HomPPI (, a class of sequence homology based approaches to protein interface prediction. We present two variants of HomPPI: (i) NPS-HomPPI (non-partner-specific HomPPI), which can be used to predict interface residues of a query protein in the absence of knowledge of the interaction partner. NPS-HomPPI is based on the results of a systematic analysis of the conditions under which interface residues of a query protein are conserved among its sequence homologs (and hence can be inferred from the known interface residues in proteins that are sequence homologs of the query protein). Our experiments suggest that when sequence homologs of the query protein can be reliably identified, NPS-HomPPI is competitive with several state-of-the-art interface prediction servers including those that exploit the structure of the query proteins. (ii) PS-HomPPI (partner-specific HomPPI), which can be used to predict the interface residues of a query protein with a specific target protein. PS-HomPPI is based on a systematic analysis of the conditions under which the interface residues that make up the interface between a query protein and its interaction partner are preserved among their homo-interologs, i.e., complexes formed by their respective sequence homologs. To the best of our knowledge, with the exception of protein-protein docking (which is computationally much more expensive than PS-HomPPI), PS-HomPPI is one of the first partner-specific protein-protein interface predictors. Our experiments with PS-HomPPI show that when homo-interologs of a query protein and its putative interaction partner can be reliably identified, the interface predictions generated by PS-HomPPI are significantly more reliable than those generated by NPS-HomPPI.

Protein-Protein Docking offers a powerful approach to computational determination of the 3-dimensional conformation of protein complexes and protein-protein interfaces. However, the reliability of conformations produced by docking is limited by the efficacy of the scoring functions used to select a few near-native conformations from among tens of thousands of possible conformations, generated by docking programs. Against this background, we introduce DockRank, a novel approach to rank docked conformations based on the degree to which the interface residues inferred from the docked conformation match the interface residues predicted by a partner-specific sequence homology based interface predictor PS-HomPPI. We compare, on a data set of 69 docked cases with 54,000 decoys per case, the ranking of conformations produced using DockRank's interface similarity scoring function applied to predicted interface residues obtained from four protein interface predictors: PS-HomPPI, and three NPS interface predictors NPS-HomPPI, PRISE, and meta-PPISP, with the rankings produced by two state-of-the-art energy-based scoring functions ZRank and IRAD. Our results show that DockRank significantly outperforms these ranking methods. Our results that NPS interface predictors (homology based and machine learning-based methods) failed to select near-native conformations that are superior to those selected by DockRank (partner-specific interface prediction based), highlight the importance of the knowledge of the binding partners in using predicted interfaces to rank docked models. The application of DockRank, as a third-party scoring function without access to all the original docked models, for improving ClusPro results on two benchmark data sets of 32 and 56 test cases shows the viability of combining our scoring function with existing docking software. An online implementation of DockRank is available at


Copyright Owner

Li Xue



Date Available


File Format


File Size

156 pages