Exploring the Information in P-values for the Analysis and Planning of Multiple-Test Experiments

Thumbnail Image
Supplemental Files
Date
2006-08-24
Authors
Ruppert, David
Nettleton, Dan
Hwang, J.T.
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Nettleton, Dan
Department Chair and Distinguished Professor
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

A new methodology is proposed for estimating the proportion of true null hypotheses in a large collection of tests. Each test concerns a single parameter δ whose value is specified by the null hypothesis. We combines a parametric model for the conditional CDF of the p-value given δ with a nonparametric spline model for the density g(δ) of δ under the alternative hypothesis. The proportion of true null hypotheses and the coefficients in the spline model are estimated by penalized least-squares subject to constraints that guarantee that the spline is a density. The estimator is computed efficiently using quadratic programming. Our methodology produces an estimate � of the density of δ when the null is false and can address such questions as “when the null is false, is the parameter usually close to the null or far away?” This leads us to define a “falsely interesting discovery rate” (FIDR), a generalization of the false discovery rate. We contrast the FIDR approach to Efron’s “empirical null hypothesis” technique. We discuss the use of � in sample size calculations based on the expected discovery rate (EDR). Our recommended estimator of the proportion of true nulls has less bias compared to estimators based upon the marginal density of the p-values at 1. In a simulation study, we compare our estimators to the convex, decreasing estimator of Langaas, Ferkingstad, and Lindqvist. The most biased of our estimators is very similar in performance to the convex, decreasing estimator. As an illustration, we analyze differences in gene expression between resistant and susceptible strains of barley.

Comments

This preprint was published as David Ruppert, Dan Nettleton, and J.T. Gen Hwang, "Exploring the Information in p-Values for the Analysis and Planning of Multiple-Test Experiments" Biometrics (2007): 483-495, doi: 10.1111/j.1541-0420.2006.00704.x.

Description
Keywords
Citation
DOI
Source
Subject Categories
Copyright
Collections