Model selection for good estimation or prediction over a user-specified covariate distribution

Pintar, Adam

Model selection for good estimation or prediction over a user-specified covariate distribution

File

Pintar_iastate_0097E_11510.pdf (1.51 MB)

Date

2010-01-01

Authors

Pintar, Adam

Advisor

Huaiqing Wu

Altmetrics

Organizational Units

Organizational Unit

Statistics

As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.

Department

Statistics

Abstract

In many applications it is common to observe a response with corresponding potential explanatory variables or covariates. Regression models using either the frequentist or Bayesian paradigm for inference are often employed to model such data. To perform model selection in the frequentist paradigm, step-wise or all-subsets selection based on the C_p criterion, the Akaike information criterion (AIC), or the Bayesian information criterion (BIC) are often used. Also, strategies based on cross-validation are available. In the Bayesian paradigm, the deviance information criterion (DIC) or posterior model probabilities are the primary tools for model selection. One theme central to these methods is that they only consider model performance at the observed data. However, in some applications we wish to predict the response or estimate the mean response over a distribution of explanatory-variable values that are different from those in the observed data. We propose a new model selection strategy that focuses on estimation or prediction over a user-specified distribution of covariate values. The idea is that, if a model is to be used for inference over a specific portion of the covariate space, that study goal should be allowed to influence the selection procedure. The new methodology and its implementation are presented via examples for linear models under the frequentist and Bayesian paradigms and for generalized linear models under the Bayesian paradigm. Furthermore, under the Bayesian paradigm, the methodology can be modified to protect against predictions that are too high or too low. Finally, simulation studies comparing the predictive ability of the new methodology to some current methods are considered.

Copyright

Fri Jan 01 00:00:00 UTC 2010

Collections

Theses and Dissertations

Full item page