Machine learning approaches for epitope prediction

El-manzalawy, Yasser

Machine learning approaches for epitope prediction

File

ELManzalawy_iastate_0097E_10054.pdf (1.78 MB)

Date

2008-01-01

Authors

El-manzalawy, Yasser

Advisor

Vasant Honavar

Altmetrics

Organizational Units

Organizational Unit

Computer Science

Department

Computer Science

Abstract

The identification and characterization of epitopes in antigenic sequences is critical for understanding disease pathogenesis, for identifying potential autoantigens, and for designing vaccines and immune-based cancer therapies. As the number of pathogen genomes fully or partially sequenced is rapidly increasing, experimental methods for epitope mapping would be prohibitive in terms of time and expenses. Therefore, computational methods for reliably identifying potential vaccine candidates (i.e., epitopes that invoke strong response from both T-cells and B-cells) are highly desirable.

Machine learning offers one of the most cost-effective and widely used approaches to developing epitope prediction tools. In the last few years, several advances in machine learning research have emerged. We utilize recent advances in machine learning research to provide epitope prediction tools with improved predictive performance. First, we introduce two methods, BCPred and FBCPred, for predicting linear B-cell epitopes and flexible length linear B-cell epitopes, respectively, using string kernel based support vector machine (SVM) classifiers. Second, we introduce three scoring matrix methods and show that they are highly competitive with a broad class of machine learning methods, including SVM, in predicting major histocompatibility complex class I (MHC-I) binding peptides. Finally, we formulate the problems of qualitatively and quantitatively predicting flexible length major histocompatibility complex class II (MHC-II) peptides as multiple instance learning and multiple instance regression problems, respectively. Based on this formulation, we introduce MHCMIR, a novel method for predicting MHC-II binding affinity using multiple instance regression.

The development of reliable epitope prediction tools is not feasible in the absence of high quality data sets. Unfortunately, most of the existing epitope benchmark data sets are comprised of epitope sequences that share high degree of similarity with other peptide sequences in the same data set. We demonstrate the pitfalls of these commonly used data sets for evaluating the performance of machine learning approaches to epitope prediction. Finally, we propose a similarity reduction procedure that is more stringent than currently used similarity reduction methods.

Copyright

Tue Jan 01 00:00:00 UTC 2008

Collections

Theses and Dissertations

Full item page