Degree Type

Dissertation

Date of Award

2013

Degree Name

Doctor of Philosophy

Department

Statistics

First Advisor

Dan Nettleton

Second Advisor

Daniel J. Nordman

Abstract

Random forest (RF) is a widely used machine learning method that shows competitive prediction performance in various fields, including biological science, finance, chemical engineering, agroscience, medical analysis, etc. In this dissertation, we study some characteristics and modifications of RFs in order to improve its prediction performance.

In CHAPTER 1, we review the mechanics of classification and regression trees (CARTs), bootstrap aggregation (bagging) and RFs. The properties of RFs are discussed, along with several variations of this method.

In CHAPTER 2, we describe a counter-intuitive discovery using RFs: the out-of-sample prediction errors can be reduced by augmenting the regressor with a new scientifically meaningless predictor variable independent of all variables in the dataset. We explain this phenomenon using a simulated example and discuss the importance of this result in interpreting predictor variable importance in RFs.

RF predictions can be biased. In CHAPTER 3, we apply an iterative debiasing approach based on bagging to RFs and test this bias correction method with real datasets. The debiasing approach can significantly improve RF predictions. The number of debiasing iterations can be tuned using cross-validation.

Standard RF methodology generates a common RF from a given training sample, regardless of test cases. In CHAPTER 4, we propose a new way to grow a RF specifically predicting a particular test case, namely, Case-Specific Random Forests (CSRF). We also suggest Case-Specific Variable Importance (CSVI), a new definition of predictor variable importance in terms of the prediction performance on a particular test case.

Prediction error estimation is generally useful in evaluation of a prediction rule. All present methods deal with estimating prediction errors averaging over the distribution of a test set. In CHAPTER 5, we propose a method to estimate expected prediction loss on a specific regressor point using RF methodology.

DOI

https://doi.org/10.31274/etd-180810-3436

Copyright Owner

Ruo Xu

Language

en

File Format

application/pdf

File Size

87 pages

Share

COinS