Degree Type

Dissertation

Date of Award

2017

Degree Name

Doctor of Philosophy

Department

Statistics

Major

Statistics

First Advisor

Dan Nettleton

Second Advisor

Ranjan Maitra

Abstract

This thesis is comprised of a collection of papers on the analysis of gene expression data, namely high-throughput RNA-sequencing (RNA-seq) data, with some methods generalizable to other scientific data. We first introduce a method for identifying differentially expressed genes using an empirical-Bayes-type analysis of RNA-seq data that employs efficient computational algorithms. A generalizable method for reparameterization is discussed, and simulation is used to demonstrate its importance in test performance. Next, exact tests for a monotone mean expression pattern are developed and incorporated into an existing pipeline for analysis of RNA-seq data. The advantages of computing exact $p$-values and of borrowing information across genes are demonstrated. The monotone tests are compared to existing tests and shown to perform favorably, particularly on data where the monotone hypothesis is appropriate. Finally, we extend existing $k$-means clustering algorithms to accommodate data with missing values and replicates. Among many other uses, clustering is often performed on gene expression patterns as an exploratory or summarizing tool. We show that in many cases, the extended algorithms improve upon existing methods and techniques without requiring significantly more computational expenditure.

Copyright Owner

Andrew Lithio

Language

en

File Format

application/pdf

File Size

129 pages

Share

COinS