Assessing differential expression when the distribution of effect sizes is asymmetric and evaluating concordance of differential expression across multiple gene expression experiments

Orr, Megan

Assessing differential expression when the distribution of effect sizes is asymmetric and evaluating concordance of differential expression across multiple gene expression experiments

File

Orr_iastate_0097E_13027.pdf (1.74 MB)

Date

2012-01-01

Authors

Orr, Megan

Advisor

Peng Liu

Dan Nettleton

Altmetrics

Organizational Units

Organizational Unit

Statistics

As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.

Department

Statistics

Abstract

The emergence and development of gene expression technologies has resulted in an ever-increasing number of high-dimensional data sets available for analysis. The availability of these data sets has prompted much research into the development of methods for statistically analyzing gene expression experiments. Many of these methods focus on identifying genes that are differentially expressed (DE), i.e., exhibit changes in mean expression levels between treatments, in a single experiment. This dissertation presents novel methods for detecting differential expression in one experiment and proposes methods for analyzing gene expression data from two independent experiments.

Many methods have been proposed for estimating the number of genes that are equivalently expressed (EE), and thus the number of DE genes, in a single gene expression experiment, but many researchers are interested in comparing the results of two independent experiments. Estimating the number of genes that are DE in two independent experiments is generally performed in two steps. First, data from each experiment are analyzed separately, and a list of genes identified as DE is obtained for each experiment. Each list is generally produced by a method that attempts to control false discovery rate (FDR) at some desired level &alpha. Then, the number of genes common to both lists is used as an estimate of the number of genes DE in both experiments. A major flaw of this method is that the resulting estimates can vary greatly depending on the value of &alpha. Chapter 2 proposes a new method that estimates the number of genes that are DE in both of two independent experiments, which includes analyzing the p-values from each experiment simultaneously, and results in a single estimate that does not depend on &alpha. Through simulation studies, we show the advantages of our approach. In Chapter 3, we extend the idea of Chapter 2 by proposing a new method for identifying genes that are DE in both of two independent experiments while controlling FDR and compare this method to two existing methods. These three methods are compared through simulation studies that show the proposed method controls FDR better as well as provides similar or better power when compared to the existing methods.

Chapter 4 proposes a new method for calculating q-values when the distribution of effect sizes in a gene expression experiment is asymmetric. This method first estimates the number of genes that are EE in an experiment based on the distribution of all p-values. Then, the p-values are split into two subsets based on the signs of their corresponding test statistics, and q-values are then calculated separately for each subset. Simulation study results show that the proposed method, when compared to the traditional q-value method, generally provides a better ranking for genes as well as a higher number of truly DE genes identified as DE, while still adequately controlling FDR.

Copyright

Sun Jan 01 00:00:00 UTC 2012

Collections

Theses and Dissertations

Full item page