#### Degree Type

Dissertation

#### Date of Award

1991

#### Degree Name

Doctor of Philosophy

#### Department

Mathematics

#### First Advisor

James L. Cornette

#### Abstract

Using encoding schemes, we study the relation between amino acids and protein structure in linear spaces;Local protein-structure prediction schemes use the amino acid sequences in a short subchain of a protein to predict the protein secondary structure of the middle residue of the chain;Two relatively reliable, objective and quantitative local prediction schemes are Robson et al.'s information theory method and and Qian & Sejnowski's neural network models. The latter achieved 64% accuracy for three-state predictions in 1989, the best prediction performance yet recorded;We assign to the 20 amino acids, through a one-to-one correspondence, the 20 columns (unit vectors) of the 20 x 20 identity matrix. Then any chain of k amino acids may be represented as a sequence of k unit vectors or a single, composite vector of length 20k. The representation of protein subchains by vectors is called a local encoding scheme;In the language of local encoding schemes, both the information theory method and the two-layer neural network model, which are 3-state predictors, partition 20k-dimensional space with three planes to distinguish the predicted secondary structures;We use the mathematical tool linear programming to construct partition planes. Our prediction schemes are objective and quantitative. We make no artificial modifications of the training scheme outputs. For 3-state prediction, we obtain quite high accuracy on both the training set, which is above 90%, and the testing set, which is about 66% for predicting about 1/3 of points in the testing set, as contrasted with Sejnowski who gets about 64% accuracy on both sets. For 2-state prediction, we obtain 91% and 86% accuracy for the training and the testing set, respectively, but it is about 50% for predicting an alpha-helical residue correctly. According to our experiments, the distribution of points in space is ambiguous, and it is difficult to find planes performing well for both training and testing sets;In addition to the local encoding scheme, a less complicated general scheme is used to study the same problem.

#### DOI

https://doi.org/10.31274/rtd-180813-9283

#### Publisher

Digital Repository @ Iowa State University, http://lib.dr.iastate.edu/

#### Copyright Owner

Wei-hua Hsieh

#### Copyright Date

1991

#### Language

en

#### Proquest ID

AAI9212149

#### File Format

application/pdf

#### File Size

175 pages

#### Recommended Citation

Hsieh, Wei-hua, "Study of secondary structure of protein sequences by linear algebra " (1991). *Retrospective Theses and Dissertations*. 9647.

https://lib.dr.iastate.edu/rtd/9647