Degree Type

Thesis

Date of Award

1-1-2006

Degree Name

Master of Science

Major

Computer Science

Abstract

Identification of interface residues involved in protein-protein and protein-DNA interactions is critical for understanding the functions of biological systems. Because identifying interface residues using experimental methods cannot catch up with the pace at which protein sequences are determined, computational methods that can identify interface residues are urgently needed. In this study, we apply machine-learning methods to identify interface residues with the focus on the methods using amino acid sequence information alone. We have developed classifiers for identification of the residues involved in protein-protein and protein-DNA interactions using a window of primary sequence as input. The classifiers were evaluated using both representative datasets and specific cases of interest based on multiple measurements. The results have shown the feasibility of identifying interface residues from sequence. We have also explored information besides primary sequence to improve the performance of sequence-based classifiers. The results show that the performance of sequence-based classifiers can be improved by using solvent accessibility and sequence entropy of the target residue as additional inputs. We have developed a database of protein-protein interfaces that consists of all the protein-protein interfaces derived from the Protein Data Bank. This database, for the fist time, makes possible the quick and flexible retrieval of interface sets and various interface features. We have systematically analyzed the characteristics of interfaces using the largest dataset available. In particular, we compared interfaces with the samples that had the same solvent accessibility as the interfaces. This strategy excludes the effect of solvent accessibility on the distributions of residues, secondary structure, and sequence entropy.

Copyright Owner

Chanjun Yang

Language

en

OCLC Number

75632716

File Format

application/pdf

File Size

55 pages

Share

COinS