Model estimation, identification and inference for next-generation functional data and spatial data

Thumbnail Image
Date
2020-01-01
Authors
Yu, Shan
Major Professor
Advisor
Dan Nettleton
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

This dissertation is composed of three research projects focused on model estimation, identification, and inference for next-generation functional data and spatial data.

The first project deals with data that are collected on a count or binary response with spatial covariate information. In this project, we introduce a new class of generalized geoadditive models (GGAMs) for spatial data distributed over complex domains. Through a link function, the proposed GGAM assumes that the mean of the discrete response variable depends on additive univariate functions of explanatory variables and a bivariate function to adjust for the spatial effect. We propose a two-stage approach for estimating and making inferences of the components in the GGAM. In the first stage, the univariate components and the geographical component in the model are approximated via univariate polynomial splines and bivariate penalized splines over triangulation, respectively. In the second stage, local polynomial smoothing is applied to the cleaned univariate data to average out the variation of the first-stage estimators. We investigate the consistency of the proposed estimators and the asymptotic normality of the univariate components. We also establish the simultaneous confidence band for each of the univariate components. The performance of the proposed method is evaluated by two simulation studies and the crash counts data in the Tampa-St. Petersburg urbanized area in Florida.

In the second project, motivated by recent work of analyzing data in the biomedical imaging studies, we consider a class of image-on-scalar regression models for imaging responses and scalar predictors. We propose to use flexible multivariate splines over triangulations to handle the irregular domain of the objects of interest on the images and other characteristics of images. The proposed estimators of the coefficient functions are proved to be root-$n$ consistent and asymptotically normal under some regularity conditions. We also provide a consistent and computationally efficient estimator of the covariance function. Asymptotic pointwise confidence intervals (PCIs) and data-driven simultaneous confidence corridors (SCCs) for the coefficient functions are constructed. A highly efficient and scalable estimation algorithm is developed. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed method. The proposed method is applied to the spatially normalized Positron Emission Tomography (PET) data of Alzheimer's Disease Neuroimaging Initiative (ADNI).

In the third project, we propose a heterogeneous functional linear model to simultaneously estimate multiple coefficient functions and identify groups, such that coefficient functions are identical within groups and distinct across groups. By borrowing information from relevant subgroups, our method enhances estimation efficiency while preserving heterogeneity. We use an adaptive fused lasso penalty to shrink subgroup coefficients to shared common values within each group. We also establish the theoretical properties of our adaptive fused lasso estimators. To enhance the computation efficiency and incorporate neighborhood information, we propose to use a graph-constrained adaptive lasso. A highly efficient and scalable estimation algorithm is developed. Monte Carlo simulation studies are conducted to examine the finite-sample performance of the proposed method. The proposed method is applied to a dataset of hybrid maize grain yields from the Genomes to Fields consortium.

Comments
Description
Keywords
Citation
Source
Copyright
Sat Aug 01 00:00:00 UTC 2020