Statistical methods for estimation, testing, and clustering with gene expression data

Lithio, Andrew

Statistical methods for estimation, testing, and clustering with gene expression data

File

Lithio_iastate_0097E_16589.pdf (4.93 MB)

Date

2017-01-01

Authors

Lithio, Andrew

Advisor

Dan Nettleton

Ranjan Maitra

Altmetrics

Organizational Units

Organizational Unit

Statistics

As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.

Department

Statistics

Abstract

This thesis is comprised of a collection of papers on the analysis of gene expression data, namely high-throughput RNA-sequencing (RNA-seq) data, with some methods generalizable to other scientific data. We first introduce a method for identifying differentially expressed genes using an empirical-Bayes-type analysis of RNA-seq data that employs efficient computational algorithms. A generalizable method for reparameterization is discussed, and simulation is used to demonstrate its importance in test performance. Next, exact tests for a monotone mean expression pattern are developed and incorporated into an existing pipeline for analysis of RNA-seq data. The advantages of computing exact $p$-values and of borrowing information across genes are demonstrated. The monotone tests are compared to existing tests and shown to perform favorably, particularly on data where the monotone hypothesis is appropriate. Finally, we extend existing $k$-means clustering algorithms to accommodate data with missing values and replicates. Among many other uses, clustering is often performed on gene expression patterns as an exploratory or summarizing tool. We show that in many cases, the extended algorithms improve upon existing methods and techniques without requiring significantly more computational expenditure.

Copyright

Sun Jan 01 00:00:00 UTC 2017

Collections

Theses and Dissertations

Full item page