Date of Award
Doctor of Philosophy
Jae Kwang Kim
Survey sampling has been considered a scientific method of collecting data that represent the target population. Statistical inference using survey data can be improved by incorporating information from existing external data sources. The auxiliary information from other sources can be incorporated into either the design or the estimation stage. In some cases, the original survey data can be augmented with extra data. The data integration can be viewed as a missing data problem and a mass imputation approach can be used for data integration. By filling in the missing values for the study variable in one sample with imputed values incorporating information from the other sample, we can obtain an improved estimator integrating information from two samples.
This dissertation addresses the development of procedures that incorporate auxiliary information or data for three different situations. Three corresponding papers constitute the dissertation and each paper deals with some aspect of incorporation of auxiliary information with survey data that enables us to gain efficiency in inference.
The first paper considers the propensity score weighting method that incorporates auxiliary information from paradata. Paradata are automatically obtainable data about a survey process, which are generated as by-product, and they can be used to handle nonresponse biases. Conditions that are necessary to obtain efficiency gain by incorporating auxiliary information from paradata into the propensity score are considered.
The second paper introduces a new approach to combine two independent probability samples that are selected from the same target population. Augmenting two surveys increases the amount of information about the quantities of our interest and enhances precision in estimation. We introduce the survey data integration method using the measurement error model approach.
The third paper deals with the integration of a two-phase sample where the two samples can be nested or non-nested. We first present the two-phase sampling using the mass imputation method, which can provide an efficient method to combine two samples where one is nested within the other. A special case of non-nested two-phase sampling where the second-phase sample is a non-probability sample is also investigated.
Park, Seho, "Survey data integration using mass imputation" (2018). Graduate Theses and Dissertations. 16761.