Scalable optimization-based feature selection with application to recommender systems

Thumbnail Image
Date
2003-01-01
Authors
Yang, Jaekyung
Major Professor
Advisor
Sigurdur Olafsson
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Industrial and Manufacturing Systems Engineering
The Department of Industrial and Manufacturing Systems Engineering teaches the design, analysis, and improvement of the systems and processes in manufacturing, consulting, and service industries by application of the principles of engineering. The Department of General Engineering was formed in 1929. In 1956 its name changed to Department of Industrial Engineering. In 1989 its name changed to the Department of Industrial and Manufacturing Systems Engineering.
Journal Issue
Is Version Of
Versions
Series
Department
Industrial and Manufacturing Systems Engineering
Abstract

Along with development of a variety of data mining techniques, numerous feature selection methods have been introduced to reduce dimensionality. This may improve scalability and make interpreting learning models easier. In this dissertation a new optimization based feature selection method using the nested partition (NP) approach is presented, including both basic analysis of the NP framework and numerical results on various experiment problems. The numerical results show how the optimal structure of the NP makes contributions on a feature selection process. Further, it is addressed how the new intelligent partitioning method obtains very high quality partition efficiently. The feature selection method is implemented as both a filter and a wrapper.;In addition, the scalability of the algorithm, which is the most significant issue in mining large databases, is also dealt with according to the instance dimension, the feature dimension, and new features adaptation. However, since the NP naturally handle the feature dimension effectively, the dissertation mostly focuses on scalability with respect to the instance dimension. In this research problem, two systematic approaches to improve scalability of instance dimension are presented, which both utilize random sampling. Through this study, a predicted best solution for the size of instance samples is presented using a two-stage version of the NP that also incorporates statistical selection, and a heuristic solution is as well presented in a new adaptive version of the algorithm. Numerical results report that those two approaches are effective for scalability improvement, and perform better than the generic NP method that uses a static sampling approach.;In order to have the NP feature selection method flexible for handling mixed type of features, feature quality evaluators are introduced to determine the order of partitioning with experiment results reporting which one performs better based on a data domain. Finally as a case study, a recommender system that can be effectively used in B2B (business to business) e-business systems is provided using classification, association rules and the new NP-based feature selection method.

Comments
Description
Keywords
Citation
Source
Copyright
Wed Jan 01 00:00:00 UTC 2003