Topics in statistical inference for massive data and high-dimensional data

Peng, Liuhua

Topics in statistical inference for massive data and high-dimensional data

File

Peng_iastate_0097E_16261.pdf (2.03 MB)

Date

2017-01-01

Authors

Peng, Liuhua

Advisor

Song Xi Chen

Dan Nettleton

Altmetrics

Organizational Units

Organizational Unit

Statistics

As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.

Department

Statistics

Abstract

This dissertation consists of three research papers that deal with three different problems in statistics concerning high-volume datasets. The first paper studies the distributed statistical inference for massive data. With the increasing size of the data, computational complexity and feasibility should be taken into consideration for statistical analyses. We investigate the statistical efficiency of the distributed version of a general class of statistics. Distributed bootstrap algorithms are proposed to approximate the distribution of the distributed statistics. These approaches relief the computational burdens of conventional methods while preserving adequate statistical efficiency. The second paper deals with testing the identity and sphericity hypotheses problem regarding high-dimensional covariance matrices, with a focus on improving the power of existing methods. By taking advantage of the sparsity in the underlying covariance matrices, the power improvement is accomplished by utilizing the banding estimator for the covariance matrices, which leads to a significant reduction in the variance of the test statistics. The last paper considers variable selection for high-dimensional data. Distance-based variable importance measures are proposed to rank and select variables with dependence structures being taken into consideration. The importance measures are inspired by the multi-response permutation procedure (MRPP) and the energy distance. A backward selection algorithm is developed to discover important variables and to improve the power of the original MRPP for high-dimensional data.

Copyright

Sun Jan 01 00:00:00 UTC 2017

Collections

Theses and Dissertations

Full item page