Topics on high dimensional statistical inference and ANOVA for longitudinal data
Date
Authors
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Abstract
The first part of this thesis proposes new tests for high dimensional data. Chapter 2 proposes a high dimensional simultaneous test for regression coefficients in linear model. This test aims to test the significance of a large number of covariates simultaneously under the so-called "large p, small n" situations where the conventional F-test is no longer applicable. We derive the asymptotic distribution of the proposed test statistic under the high dimensional null hypothesis and various scenarios of the alternatives, which allow power evaluations. We further extend the result to linear model with factorial designs. We also evaluate the power of the F-test under very mild dimensionality. Chapter 3 considers a test for high dimensional means under sparsity and dependency. We propose a threshold test statistic, which is designed to detect sparse and faint signal. The asymptotic distribution is obtained for non normal and dependent data under the "large p, small n'' setting, where the data dimension can grow exponentially fast as the sample size grows. A maximum test, which maximizes the standardized threshold test statistic over a range of thresholds, is also proposed. It is shown that the maximum test can attain the optimal detection boundary, in the sense that asymptotically, all the tests would be powerless below the boundary.
The second part of this thesis is on analysis of variance (ANOVA) tests for treatment effects in longitudinal data with missing values. The treatment effects are modelled semiparametrically via a partially linear regression which is flexible in quantifying the time effects of treatments. The empirical likelihood is employed to formulate model-robust nonparametric ANOVA tests for treatment effects with respect to covariates, the nonparametric time-effect functions and interactions between covariates and time. The proposed tests can be readily modified for a variety of data and model combinations, that encompass parametric, semiparametric and nonparametric regression models; cross-sectional and longitudinal data, and with or without missing values.