Degree Type

Thesis

Date of Award

2018

Degree Name

Master of Science

Department

Electrical and Computer Engineering

Major

Computer Engineering

First Advisor

Srikanta Tirthapura

Abstract

Supervised machine learning is an approach where an algorithm estimates a mapping

function by using labeled data i.e. utilizing data attributes and target values. One of the major

obstacles in supervised learning is the labeling step. Obtaining labeled data is an expensive

procedure since it typically requires human effort. Training a model with too little data tends

to overfit therefore in order to achieve a reasonable accuracy of prediction we need a minimum

number of labeled examples. This is also true for streaming machine learning models. Maintaining

a model without rebuilding and performing a prediction task without ever storing input samples are

the key concepts of streaming machine learning models. A successful and widely used streaming

model is the Hoeffding tree which has large labeling complexity. In this work, we present Frugal

Hoeffding tree, a variation of the Hoeffding tree that uses less labeled data, and provides similar

performance as the original Hoeffding tree. We conduct experiments on large real-world datasets

where we compare the performances of traditional batch decision trees, the Hoeffding tree and

the Frugal Hoeffding tree. We show that the Frugal Hoeffding tree consumes less labeled data

yet can achieve classification performance similar to the Hoeffding tree.

DOI

https://doi.org/10.31274/etd-180810-6013

Copyright Owner

Yesdaulet Izenov

Language

en

File Format

application/pdf

File Size

30 pages

Share

COinS