Degree Type


Date of Award


Degree Name

Master of Science


Electrical and Computer Engineering


Computer Engineering

First Advisor

Srikanta Tirthapura


Supervised machine learning is an approach where an algorithm estimates a mapping

function by using labeled data i.e. utilizing data attributes and target values. One of the major

obstacles in supervised learning is the labeling step. Obtaining labeled data is an expensive

procedure since it typically requires human effort. Training a model with too little data tends

to overfit therefore in order to achieve a reasonable accuracy of prediction we need a minimum

number of labeled examples. This is also true for streaming machine learning models. Maintaining

a model without rebuilding and performing a prediction task without ever storing input samples are

the key concepts of streaming machine learning models. A successful and widely used streaming

model is the Hoeffding tree which has large labeling complexity. In this work, we present Frugal

Hoeffding tree, a variation of the Hoeffding tree that uses less labeled data, and provides similar

performance as the original Hoeffding tree. We conduct experiments on large real-world datasets

where we compare the performances of traditional batch decision trees, the Hoeffding tree and

the Frugal Hoeffding tree. We show that the Frugal Hoeffding tree consumes less labeled data

yet can achieve classification performance similar to the Hoeffding tree.


Copyright Owner

Yesdaulet Izenov



File Format


File Size

30 pages