Degree Type
Dissertation
Date of Award
2017
Degree Name
Doctor of Philosophy
Department
Statistics
Major
Statistics
First Advisor
Dan Nettleton
Second Advisor
Ranjan Maitra
Abstract
This thesis is comprised of a collection of papers on the analysis of gene expression data, namely high-throughput RNA-sequencing (RNA-seq) data, with some methods generalizable to other scientific data. We first introduce a method for identifying differentially expressed genes using an empirical-Bayes-type analysis of RNA-seq data that employs efficient computational algorithms. A generalizable method for reparameterization is discussed, and simulation is used to demonstrate its importance in test performance. Next, exact tests for a monotone mean expression pattern are developed and incorporated into an existing pipeline for analysis of RNA-seq data. The advantages of computing exact $p$-values and of borrowing information across genes are demonstrated. The monotone tests are compared to existing tests and shown to perform favorably, particularly on data where the monotone hypothesis is appropriate. Finally, we extend existing $k$-means clustering algorithms to accommodate data with missing values and replicates. Among many other uses, clustering is often performed on gene expression patterns as an exploratory or summarizing tool. We show that in many cases, the extended algorithms improve upon existing methods and techniques without requiring significantly more computational expenditure.
DOI
https://doi.org/10.31274/etd-180810-5177
Copyright Owner
Andrew Lithio
Copyright Date
2017
Language
en
File Format
application/pdf
File Size
129 pages
Recommended Citation
Lithio, Andrew, "Statistical methods for estimation, testing, and clustering with gene expression data" (2017). Graduate Theses and Dissertations. 15560.
https://lib.dr.iastate.edu/etd/15560