Degree Type

Dissertation

Date of Award

2017

Degree Name

Doctor of Philosophy

Department

Statistics

Major

Statistics

First Advisor

Dianne Cook

Second Advisor

Heike Hofmann

Abstract

Classification methods are widely used for types problems where rules to sort observations into groups are needed. There are many different methods to fit classification models but nothing is universally best. This research develops new classification methods, and visual tools for exploring the algorithms and results introduced in this work. The new classification method is a random forest built on trees using linear combinations of variables, which improves the predictive performance when the separation between classes is in combinations of variables. It is called a projection pursuit random forest (PPF). The benefit of the method is demonstrated using a simulation study, and on a suite of benchmark data. It is implemented in the R package, PPforest, with core functions in Rcpp to improve the computational speed. The process of bagging and combining results from multiple trees produces numerous diagnostics which, with interactive graphics, can provide a lot of insight into the class structure in high dimensions. A web app is designed and developed for this purpose. In the process of developing the PPF some deficiencies were observed in the tree algorithm, PPtree, forming the basic building block. This led to modifications to the algorithm, implemented in the R package, PPtreeExt, and a small web app to help digest differences between various model parameter choices.

Copyright Owner

Natalia Da Silva Cousillas

Language

en

File Format

application/pdf

File Size

114 pages

Share

COinS