Multiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant covariates

Thumbnail Image
Date
2018-01-01
Authors
Nguyen, Yet
Major Professor
Advisor
DANIEL S. NETTLETON
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

This dissertation is a collection of four papers on the development of statistical methods for the analysis of high-dimensional data, mostly RNA-seq gene expression data. We introduce in the first two papers two covariate-selection strategies for RNA-seq analysis. As in any experiment or observational study, covariates may hold information about heterogeneity of the experimental or observational units used in the investigation. Either ignoring relevant covariates or accounting for irrelevant covariates may be detrimental to RNA-seq analysis. We show through simulation that our methods outperform methods that do not take covariate selection into account. Next, we develop in the third paper a parametric bootstrap algorithm to analyze RNA-seq datasets from repeated measures designs. In such designs, RNA samples are extracted from each experimental unit at multiple time points. The read counts that result from RNA sequencing of the samples extracted from the same experimental unit tend to be temporally correlated. Simulation studies show the advantages of our method over alternatives that do not account for correlation among observations within experimental units. Finally, we develop a new method to estimate and control false discovery rate (FDR) when identifying simultaneous signals in two independent experiments. Our FDR estimation and control procedure is a generalization of the histogram-based FDR estimation and control procedure for one experiment proposed by Nettleton et al. (2016); Liang and Nettleton (2012). We show that our method performs well and better than other existing methods both in theory and in simulation.

Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright
Tue May 01 00:00:00 UTC 2018