Degree Type

Dissertation

Date of Award

2014

Degree Name

Doctor of Philosophy

Department

Industrial and Manufacturing Systems Engineering

First Advisor

Sigurdur Olafsson

Abstract

Aspects of a classifier's training dataset can often make building a helpful and high accuracy classifier difficult. Instance selection addresses some of the issues in a dataset by selecting a subset of the data in such a way that learning from the reduced dataset leads to a better classifier. This work introduces an integer programming formulation of instance selection that relies on column generation techniques to obtain a good solution to the problem. Experimental results show that instance selection improves the usefulness of some classifiers by optimizing the training data so that that the training dataset has easier to learn boundaries between class values. Also included in this paper are two case studies from the Surveillance, Epidemiology, and End Results (SEER) database that further confirm the benefit of instance selection. Overall, results indicate that performing instance selection for a classifier is a competitive classification approach. However, it should be noted that instance selection might overfit classifiers that have already achieved a good fit to the dataset.

DOI

https://doi.org/10.31274/etd-180810-2809

Copyright Owner

Walter Dean Bennette

Language

en

File Format

application/pdf

File Size

109 pages

Share

COinS