Degree Type

Dissertation

Date of Award

2003

Degree Name

Doctor of Philosophy

Department

Industrial and Manufacturing Systems Engineering

First Advisor

Sigurdur Olafsson

Abstract

Along with development of a variety of data mining techniques, numerous feature selection methods have been introduced to reduce dimensionality. This may improve scalability and make interpreting learning models easier. In this dissertation a new optimization based feature selection method using the nested partition (NP) approach is presented, including both basic analysis of the NP framework and numerical results on various experiment problems. The numerical results show how the optimal structure of the NP makes contributions on a feature selection process. Further, it is addressed how the new intelligent partitioning method obtains very high quality partition efficiently. The feature selection method is implemented as both a filter and a wrapper.;In addition, the scalability of the algorithm, which is the most significant issue in mining large databases, is also dealt with according to the instance dimension, the feature dimension, and new features adaptation. However, since the NP naturally handle the feature dimension effectively, the dissertation mostly focuses on scalability with respect to the instance dimension. In this research problem, two systematic approaches to improve scalability of instance dimension are presented, which both utilize random sampling. Through this study, a predicted best solution for the size of instance samples is presented using a two-stage version of the NP that also incorporates statistical selection, and a heuristic solution is as well presented in a new adaptive version of the algorithm. Numerical results report that those two approaches are effective for scalability improvement, and perform better than the generic NP method that uses a static sampling approach.;In order to have the NP feature selection method flexible for handling mixed type of features, feature quality evaluators are introduced to determine the order of partitioning with experiment results reporting which one performs better based on a data domain. Finally as a case study, a recommender system that can be effectively used in B2B (business to business) e-business systems is provided using classification, association rules and the new NP-based feature selection method.

DOI

https://doi.org/10.31274/rtd-180813-9919

Publisher

Digital Repository @ Iowa State University, http://lib.dr.iastate.edu

Copyright Owner

Jaekyung Yang

Language

en

Proquest ID

AAI3118270

File Format

application/pdf

File Size

137 pages

Share

COinS