Date of Award
Doctor of Philosophy
Classification methods are widely used for types problems where rules to sort observations into groups are needed. There are many different methods to fit classification models but nothing is universally best. This research develops new classification methods, and visual tools for exploring the algorithms and results introduced in this work. The new classification method is a random forest built on trees using linear combinations of variables, which improves the predictive performance when the separation between classes is in combinations of variables. It is called a projection pursuit random forest (PPF). The benefit of the method is demonstrated using a simulation study, and on a suite of benchmark data. It is implemented in the R package, PPforest, with core functions in Rcpp to improve the computational speed. The process of bagging and combining results from multiple trees produces numerous diagnostics which, with interactive graphics, can provide a lot of insight into the class structure in high dimensions. A web app is designed and developed for this purpose. In the process of developing the PPF some deficiencies were observed in the tree algorithm, PPtree, forming the basic building block. This led to modifications to the algorithm, implemented in the R package, PPtreeExt, and a small web app to help digest differences between various model parameter choices.
Natalia Da Silva Cousillas
Da Silva Cousillas, Natalia, "Bagged projection methods for supervised classification in big data" (2017). Graduate Theses and Dissertations. 15506.