Date of Award
Doctor of Philosophy
Jae Kwang Kim
This dissertation addresses some issues often raised in analyzing survey data, specifically on small area estimation, multilevel models and item nonresponse, and develops new methods to reasonably handle the issues.
Many large-scale surveys are designed to achieve acceptable reliability for large domains. For small domains, direct estimators are unreliable due to small sample sizes and model-based methods are needed. The first issue has been motivated by the Conservation Effects Assessment Project (CEAP), a survey intended to quantify soil and nutrient loss on crop field. Our goal is to predict quantiles of several measures of erosion, which are important parameters in the CEAP, in small domains. As a general approach to predict the small area quantiles, we develop a modified procedure to Jang and Wang (2015) based on a mixed effects quantile regression model; they propose Bayesian methods using the linearly interpolated generalized Pareto distribution for inference and estimation of quantile regression coefficients. We apply the procedure to predict county-level quantiles for four types of erosions in Wisconsin. We further develop two extensions of the proposed method to zero-inflated data and survey data under an informative sample design. Both types of data are commonly observed in real data applications.
Clustered data are often found in many applications of statistics. Multilevel models, such as generalized linear mixed models, are widely used in the analysis of clustered data. If the cluster size is associated with cluster-level random effects, it is called an informative cluster size. In the presence of an informative cluster size, standard maximum likelihood estimators lead to biases. This is an intractable issue for multilevel models due to unobserved random effects. We propose a new parameter estimation method using a within-cluster resampling, which does not require a correct specification of a model for the cluster size.
Item nonresponse is also frequently encountered in practice. Imputation is a popular technique to handle item nonresponse by replacing missing values with a plausible value or a set of plausible values. Parametric imputation is based on a parametric model for imputation and is less roubust against the failure of the imputation model. Nonparametric imputation is fully robust but is not applicable to large dimensional data due to the curse of dimensionality. We propose a semiparametric imputation method using a conditional Gaussian mixture model assumption, which is more flexible than the imputation method based on joint Gaussian mixture models. We show that the proposed mixture model has a lower approximation error to a true underlying density function than the GMM and improves prediction accuracy through simulation studies.
Lee, Danhyang, "Topics on small area estimation, multilevel models, and semiparametric imputation" (2019). Graduate Theses and Dissertations. 17496.