Degree Type

Dissertation

Date of Award

2018

Degree Name

Doctor of Philosophy

Department

Statistics

Major

Statistics

First Advisor

DANIEL S. NETTLETON

Abstract

This dissertation is a collection of four papers on the development of statistical methods for the analysis of high-dimensional data, mostly RNA-seq gene expression data. We introduce in the first two papers two covariate-selection strategies for RNA-seq analysis. As in any experiment or observational study, covariates may hold information about heterogeneity of the experimental or observational units used in the investigation. Either ignoring relevant covariates or accounting for irrelevant covariates may be detrimental to RNA-seq analysis. We show through simulation that our methods outperform methods that do not take covariate selection into account. Next, we develop in the third paper a parametric bootstrap algorithm to analyze RNA-seq datasets from repeated measures designs. In such designs, RNA samples are extracted from each experimental unit at multiple time points. The read counts that result from RNA sequencing of the samples extracted from the same experimental unit tend to be temporally correlated. Simulation studies show the advantages of our method over alternatives that do not account for correlation among observations within experimental units. Finally, we develop a new method to estimate and control false discovery rate (FDR) when identifying simultaneous signals in two independent experiments. Our FDR estimation and control procedure is a generalization of the histogram-based FDR estimation and control procedure for one experiment proposed by Nettleton et al. (2016); Liang and Nettleton (2012). We show that our method performs well and better than other existing methods both in theory and in simulation.

DOI

https://doi.org/10.31274/etd-180810-6056

Copyright Owner

Yet Nguyen

Language

en

File Format

application/pdf

File Size

139 pages

Share

COinS