SeqStruct : A New Amino Acid Similarity Matrix Based on Sequence Correlations and Structural Contacts Yields Sequence-Structure Congruence

Thumbnail Image
Date
2018-02-21
Authors
Jia, Kejue
Jernigan, Robert
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Jernigan, Robert
Distinguished Professor
Research Projects
Organizational Units
Organizational Unit
Organizational Unit
Bioinformatics and Computational Biology
The Bioinformatics and Computational Biology (BCB) Program at Iowa State University is an interdepartmental graduate major offering outstanding opportunities for graduate study toward the Ph.D. degree in Bioinformatics and Computational Biology. The BCB program involves more than 80 nationally and internationally known faculty—biologists, computer scientists, mathematicians, statisticians, and physicists—who participate in a wide range of collaborative projects.
Journal Issue
Is Version Of
Versions
Series
Department
Biochemistry, Biophysics and Molecular BiologyBioinformatics and Computational Biology
Abstract

Protein sequence matching does not properly account for some well-known features of protein structures: surface residues being more variable than core residues, the high packing densities in globular proteins, and does not yield good matches of sequences of many proteins known to be close structural relatives. There are now abundant protein sequences and structures to enable major improvements to sequence matching. Here, we utilize structural frameworks to mount the observed correlated sequences to identify the most important correlated parts. The rationale is that protein structures provide the important physical framework for improving sequence matching. Combining the sequence and structure data in this way leads to a simple amino acid substitution matrix that can be readily incorporated into any sequence matching. This enables the incorporation of allosteric information into sequence matching and transforms it effectively from a 1-D to a 3-D procedure. The results from testing in over 3,000 sequence matches demonstrate a 37% gain in sequence similarity and a loss of 26% of the gaps when compared with the use of BLOSUM62. And, importantly there are major gains in the specificity of sequence matching across diverse proteins. Specifically, all known cases where protein structures match but sequences do not match well are resolved.

Comments

This is a preprint made available through bioRxiv: doi: 10.1101/268904.

Description
Keywords
Citation
DOI
Copyright
Mon Jan 01 00:00:00 UTC 2018
Collections