Date of Award
Doctor of Philosophy
In many applications it is common to observe a response with corresponding potential explanatory variables or covariates. Regression models using either the frequentist or Bayesian paradigm for inference are often employed to model such data. To perform model selection in the frequentist paradigm, step-wise or all-subsets selection based on the Cp criterion, the Akaike information criterion (AIC), or the Bayesian information criterion (BIC) are often used. Also, strategies based on cross-validation are available. In the Bayesian paradigm, the deviance information criterion (DIC) or posterior model probabilities are the primary tools for model selection. One theme central to these methods is that they only consider model performance at the observed data. However, in some applications we wish to predict the response or estimate the mean response over a distribution of explanatory-variable values that are different from those in the observed data. We propose a new model selection strategy that focuses on estimation or prediction over a user-specified distribution of covariate values. The idea is that, if a model is to be used for inference over a specific portion of the covariate space, that study goal should be allowed to influence the selection procedure. The new methodology and its implementation are presented via examples for linear models under the frequentist and Bayesian paradigms and for generalized linear models under the Bayesian paradigm. Furthermore, under the Bayesian paradigm, the methodology can be modified to protect against predictions that are too high or too low. Finally, simulation studies comparing the predictive ability of the new methodology to some current methods are considered.
Adam Lee Pintar
Pintar, Adam Lee, "Model selection for good estimation or prediction over a user-specified covariate distribution" (2010). Graduate Theses and Dissertations. 11685.