Date of Award
Doctor of Philosophy
Stephen B. Vardeman
Max D. Morris
A variety of conditional probability models estimate the regression or class probability function for the purpose of prediction or classification. Bayesian mixture models provide flexible prediction and classification methods for modeling local linearities of the regression or class probability function. A hierarchical Bayes Gaussian mixture model is proposed that directly uses data to define a mixture prior for its Gaussian mixture component parameters.
This nonparametric Bayesian mixture model uses the stick-breaking construction of a Dirichlet process model. Prediction and classification comes directly from the posterior distribution via Gibbs sampling. Comprehensive simulation studies demonstrate performance of both the regression and classification methods. Five standard machine learning data sets show prediction and classification results competitive with local methods. A generic classification algorithm is outlined given categorical predictors. If too many categories are present or if many interaction levels affect the class probability function, no current methods can reduce bias effectively. A proposed solution is a generic way to characterize the information about the class probability function available in the predictors through likelihood ratio statistics. This proposed classifier relies on random forests to reduce bias by utilizing all information in the generated log likelihood ratio features. A simulation study and an application data set demonstrate potential advantages of this classification method for categorical predictors.
Cory Lee Lanker
Lanker, Cory Lee, "Local prediction and classification techniques for machine learning and data mining" (2015). Graduate Theses and Dissertations. 14404.