Topics in matrix completion and genomic prediction

Thumbnail Image
Date
2018-01-01
Authors
Mao, Xiaojun
Major Professor
Advisor
Song Xi Chen
Daniel S. Nettleton
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

This dissertation consists of three projects focused on low-rank modeling to deal with matrix completion problems and genomic prediction by adjusting spatial effects. One big challenge in matrix completion is that the real data arising are high-dimensional, low-rank and have many missing entries. In the first project (Chapter 2), we propose a column-space-decomposition model with the utilization of some additional covariate information. This helps us both in improving the prediction of ratings and understanding how the covariates affect the missingness and ratings. The proposed estimation method is shown to provide efficient estimators and achieve computational efficiency. In the second project (Chapter 3), we are motivated by a general low-rank missing mechanism rather than the specific missing-at-random mechanism assumed in the first project. We consider an additive model with mean effect to estimate the linear predictors which are further used to estimate the probabilities of observations. To get the prediction of ratings under non-uniform missingness, we adopt a weighted objective function and apply constraints to the estimator of probabilities to avoid issues with extreme values. Both the asymptotic convergence rates and numerical efficiencies of the proposed estimators of probabilities and ratings are studied.

In the third project (Chapter 4), we address challenges that arise when phenotypes measured on plants grown in fields are spatially correlated. We focus on a Gaussian random field (GRF) model with an additive covariance matrix structure that incorporates the genotype effects, spatial effects and subpopulation effects to predict phenotypes from a huge number of marker genotypes, accounting for the spatial dependence among measurements. Two datasets are studied by using the GRF model to show the benefits of spatial effects adjustments. Further, we apply the proposed GRF method to help choose the best plants in terms of a specific phenotype.

Comments
Description
Keywords
Citation
DOI
Source
Subject Categories
Copyright
Wed Aug 01 00:00:00 UTC 2018