Preprint # 04-3
Microarray technology has become widespread as a means to investigate gene function and metabolic pathways in an organism. A common experiment involves probing, at each of several time points, the gene expression of experimental units subjected to different treatments. Due to the high cost of microarrays, such experiments may be performed without replication and therefore provide a gene expression measurement of only one experimental unit for each combination of treatment and time point. Though an experiment with replication would provide more powerful conclusions, it is still possible to identify differentially expressed genes and to estimate the number of false positives for specified rejection region when the data is unreplicated. We present a method for identifying differentially expressed genes in this situation that utilizes polynomial regression models to approximate underlying expression patterns. In the first stage of a two-stage permutation approach, we choose a `best' model at each gene after considering all possible regression models involving treatment effects, terms polynomial in time, and interactions between treatments and polynomial terms. In the second stage, we identify genes whose `best' model differs significantly from the overall mean model as differentially expressed. The number of expected false positives in the chosen rejection region and the overall proportion of differentially expressed genes are both estimated using a method presented by Storey and Tibshirani (2003, Proceedings of the National Academy of Sciences 100, 9440-9445). For illustration, the proposed method is applied to an Arabidopsis thaliana microarray data set.