Date of Award
Doctor of Philosophy
Industrial and Manufacturing Systems Engineering
John K. Jackman
The class imbalance problem in classification has been recognized as a significant research problem in recent years and a number of methods have been introduced to improve classification results. Rebalancing class distributions (such as over-sampling or under-sampling of learning datasets) has been popular due to its ease of implementation and relatively good performance. For the Support Vector Machine (SVM) classification algorithm, research efforts have focused on reducing the size of learning sets because of the algorithm's sensitivity to the size of the dataset. In this dissertation, we propose a metaheuristic approach (Genetic Algorithm) for under-sampling of an imbalanced dataset in the context of a SVM classifier. The goal of this approach is to find an optimal learning set from imbalanced datasets without empirical studies that are normally required to find an optimal class distribution. Experimental results using real datasets indicate that this metaheuristic under-sampling performed well in rebalancing class distributions. Furthermore, an iterative sampling methodology was used to produce smaller learning sets by removing redundant instances. It incorporates informative and the representative under-sampling mechanisms to speed up the learning procedure for imbalanced data learning with a SVM. When compared with existing rebalancing methods and the metaheuristic approach to under-sampling, this iterative methodology not only provides good performance but also enables a SVM classifier to learn using very small learning sets for imbalanced data learning. For large-scale imbalanced datasets, this methodology provides an efficient and effective solution for imbalanced data learning with an SVM.
Jong Myong Choi
Choi, Jong Myong, "A Selective Sampling Method for Imbalanced Data Learning on Support Vector Machines" (2010). Graduate Theses and Dissertations. 11529.