Fully Bayesian Analysis of RNA-seq Counts for the Detection of Gene Expression Heterosis

Thumbnail Image
Date
2018-11-13
Authors
Nettleton, Dan
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Nettleton, Dan
Department Chair and Distinguished Professor
Person
Niemi, Jarad
Professor
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

Heterosis, or hybrid vigor, is the enhancement of the phenotype of hybrid progeny relative to their inbred parents. Heterosis is extensively used in agriculture, and the underlying mechanisms are unclear. To investigate the molecular basis of phenotypic heterosis, researchers search tens of thousands of genes for heterosis with respect to expression in the transcriptome. Difficulty arises in the assessment of heterosis due to composite null hypotheses and nonuniform distributions for p-values under these null hypotheses. Thus, we develop a general hierarchical model for count data and a fully Bayesian analysis in which an efficient parallelized Markov chain Monte Carlo algorithm ameliorates the computational burden. We use our method to detect gene expression heterosis in a two-hybrid plant-breeding scenario, both in a real RNA-seq maize dataset and in simulation studies. In the simulation studies, we show our method has well-calibrated posterior probabilities and credible intervals when the model assumed in analysis matches the model used to simulate the data. Although model misspecification can adversely affect calibration, the methodology is still able to accurately rank genes. Finally, we show that hyperparameter posteriors are extremely narrow and an empirical Bayes (eBayes) approach based on posterior means from the fully Bayesian analysis provides virtually equivalent posterior probabilities, credible intervals, and gene rankings relative to the fully Bayesian solution. This evidence of equivalence provides support for the use of eBayes procedures in RNA-seq data analysis if accurate hyperparameter estimates can be obtained. Supplementary materials for this article are available online.

Comments

This is a manuscript of an article published as Landau, Will, Jarad Niemi, and Dan Nettleton. "Fully Bayesian Analysis of RNA-seq Counts for the Detection of Gene Expression Heterosis." Journal of the American Statistical Association 114, no. 526 (2019): 610-621. doi: 10.1080/01621459.2018.1497496. Posted with permission.

Description
Keywords
Citation
DOI
Copyright
Mon Jan 01 00:00:00 UTC 2018
Collections