Assessing differential expression when the distribution of effect sizes is asymmetric and evaluating concordance of differential expression across multiple gene expression experiments

Thumbnail Image
Date
2012-01-01
Authors
Orr, Megan
Major Professor
Advisor
Peng Liu
Dan Nettleton
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

The emergence and development of gene expression technologies has resulted in an ever-increasing number of high-dimensional data sets available for analysis. The availability of these data sets has prompted much research into the development of methods for statistically analyzing gene expression experiments. Many of these methods focus on identifying genes that are differentially expressed (DE), i.e., exhibit changes in mean expression levels between treatments, in a single experiment. This dissertation presents novel methods for detecting differential expression in one experiment and proposes methods for analyzing gene expression data from two independent experiments.

Many methods have been proposed for estimating the number of genes that are equivalently expressed (EE), and thus the number of DE genes, in a single gene expression experiment, but many researchers are interested in comparing the results of two independent experiments. Estimating the number of genes that are DE in two independent experiments is generally performed in two steps. First, data from each experiment are analyzed separately, and a list of genes identified as DE is obtained for each experiment. Each list is generally produced by a method that attempts to control false discovery rate (FDR) at some desired level &alpha. Then, the number of genes common to both lists is used as an estimate of the number of genes DE in both experiments. A major flaw of this method is that the resulting estimates can vary greatly depending on the value of &alpha. Chapter 2 proposes a new method that estimates the number of genes that are DE in both of two independent experiments, which includes analyzing the p-values from each experiment simultaneously, and results in a single estimate that does not depend on &alpha. Through simulation studies, we show the advantages of our approach. In Chapter 3, we extend the idea of Chapter 2 by proposing a new method for identifying genes that are DE in both of two independent experiments while controlling FDR and compare this method to two existing methods. These three methods are compared through simulation studies that show the proposed method controls FDR better as well as provides similar or better power when compared to the existing methods.

Chapter 4 proposes a new method for calculating q-values when the distribution of effect sizes in a gene expression experiment is asymmetric. This method first estimates the number of genes that are EE in an experiment based on the distribution of all p-values. Then, the p-values are split into two subsets based on the signs of their corresponding test statistics, and q-values are then calculated separately for each subset. Simulation study results show that the proposed method, when compared to the traditional q-value method, generally provides a better ranking for genes as well as a higher number of truly DE genes identified as DE, while still adequately controlling FDR.

Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright
Sun Jan 01 00:00:00 UTC 2012