Model selection for good estimation or prediction over a user-specified covariate distribution

Thumbnail Image
Date
2010-01-01
Authors
Pintar, Adam
Major Professor
Advisor
Huaiqing Wu
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

In many applications it is common to observe a response with corresponding potential explanatory variables or covariates. Regression models using either the frequentist or Bayesian paradigm for inference are often employed to model such data. To perform model selection in the frequentist paradigm, step-wise or all-subsets selection based on the Cp criterion, the Akaike information criterion (AIC), or the Bayesian information criterion (BIC) are often used. Also, strategies based on cross-validation are available. In the Bayesian paradigm, the deviance information criterion (DIC) or posterior model probabilities are the primary tools for model selection. One theme central to these methods is that they only consider model performance at the observed data. However, in some applications we wish to predict the response or estimate the mean response over a distribution of explanatory-variable values that are different from those in the observed data. We propose a new model selection strategy that focuses on estimation or prediction over a user-specified distribution of covariate values. The idea is that, if a model is to be used for inference over a specific portion of the covariate space, that study goal should be allowed to influence the selection procedure. The new methodology and its implementation are presented via examples for linear models under the frequentist and Bayesian paradigms and for generalized linear models under the Bayesian paradigm. Furthermore, under the Bayesian paradigm, the methodology can be modified to protect against predictions that are too high or too low. Finally, simulation studies comparing the predictive ability of the new methodology to some current methods are considered.

Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright
Fri Jan 01 00:00:00 UTC 2010