Date of Award
Doctor of Philosophy
Industrial and Manufacturing Systems Engineering
Along with development of a variety of data mining techniques, numerous feature selection methods have been introduced to reduce dimensionality. This may improve scalability and make interpreting learning models easier. In this dissertation a new optimization based feature selection method using the nested partition (NP) approach is presented, including both basic analysis of the NP framework and numerical results on various experiment problems. The numerical results show how the optimal structure of the NP makes contributions on a feature selection process. Further, it is addressed how the new intelligent partitioning method obtains very high quality partition efficiently. The feature selection method is implemented as both a filter and a wrapper.;In addition, the scalability of the algorithm, which is the most significant issue in mining large databases, is also dealt with according to the instance dimension, the feature dimension, and new features adaptation. However, since the NP naturally handle the feature dimension effectively, the dissertation mostly focuses on scalability with respect to the instance dimension. In this research problem, two systematic approaches to improve scalability of instance dimension are presented, which both utilize random sampling. Through this study, a predicted best solution for the size of instance samples is presented using a two-stage version of the NP that also incorporates statistical selection, and a heuristic solution is as well presented in a new adaptive version of the algorithm. Numerical results report that those two approaches are effective for scalability improvement, and perform better than the generic NP method that uses a static sampling approach.;In order to have the NP feature selection method flexible for handling mixed type of features, feature quality evaluators are introduced to determine the order of partitioning with experiment results reporting which one performs better based on a data domain. Finally as a case study, a recommender system that can be effectively used in B2B (business to business) e-business systems is provided using classification, association rules and the new NP-based feature selection method.
Digital Repository @ Iowa State University, http://lib.dr.iastate.edu
Yang, Jaekyung, "Scalable optimization-based feature selection with application to recommender systems " (2003). Retrospective Theses and Dissertations. 756.