Multiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant covariates

Nguyen, Yet

Multiple hypothesis testing and RNA-seq differential expression analysis accounting for dependence and relevant covariates

File

Nguyen_iastate_0097E_17336.pdf (8.9 MB)

Date

2018-01-01

Authors

Nguyen, Yet

Advisor

DANIEL S. NETTLETON

Altmetrics

Organizational Units

Organizational Unit

Statistics

As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.

Department

Statistics

Abstract

This dissertation is a collection of four papers on the development of statistical methods for the analysis of high-dimensional data, mostly RNA-seq gene expression data. We introduce in the first two papers two covariate-selection strategies for RNA-seq analysis. As in any experiment or observational study, covariates may hold information about heterogeneity of the experimental or observational units used in the investigation. Either ignoring relevant covariates or accounting for irrelevant covariates may be detrimental to RNA-seq analysis. We show through simulation that our methods outperform methods that do not take covariate selection into account. Next, we develop in the third paper a parametric bootstrap algorithm to analyze RNA-seq datasets from repeated measures designs. In such designs, RNA samples are extracted from each experimental unit at multiple time points. The read counts that result from RNA sequencing of the samples extracted from the same experimental unit tend to be temporally correlated. Simulation studies show the advantages of our method over alternatives that do not account for correlation among observations within experimental units. Finally, we develop a new method to estimate and control false discovery rate (FDR) when identifying simultaneous signals in two independent experiments. Our FDR estimation and control procedure is a generalization of the histogram-based FDR estimation and control procedure for one experiment proposed by Nettleton et al. (2016); Liang and Nettleton (2012). We show that our method performs well and better than other existing methods both in theory and in simulation.

Copyright

Tue May 01 00:00:00 UTC 2018

Collections

Theses and Dissertations

Full item page