Degree Type

Dissertation

Date of Award

2015

Degree Name

Doctor of Philosophy

Department

Statistics

First Advisor

Stephen B. Vardeman

Second Advisor

Max D. Morris

Abstract

A variety of conditional probability models estimate the regression or class probability function for the purpose of prediction or classification. Bayesian mixture models provide flexible prediction and classification methods for modeling local linearities of the regression or class probability function. A hierarchical Bayes Gaussian mixture model is proposed that directly uses data to define a mixture prior for its Gaussian mixture component parameters.

This nonparametric Bayesian mixture model uses the stick-breaking construction of a Dirichlet process model. Prediction and classification comes directly from the posterior distribution via Gibbs sampling. Comprehensive simulation studies demonstrate performance of both the regression and classification methods. Five standard machine learning data sets show prediction and classification results competitive with local methods. A generic classification algorithm is outlined given categorical predictors. If too many categories are present or if many interaction levels affect the class probability function, no current methods can reduce bias effectively. A proposed solution is a generic way to characterize the information about the class probability function available in the predictors through likelihood ratio statistics. This proposed classifier relies on random forests to reduce bias by utilizing all information in the generated log likelihood ratio features. A simulation study and an application data set demonstrate potential advantages of this classification method for categorical predictors.

DOI

https://doi.org/10.31274/etd-180810-3955

Copyright Owner

Cory Lee Lanker

Language

en

File Format

application/pdf

File Size

96 pages

Share

COinS