Degree Type

Dissertation

Date of Award

2015

Degree Name

Doctor of Philosophy

Department

Electrical and Computer Engineering

Major

Computer Engineering

First Advisor

Srikanta Tirthapura

Abstract

With the advancement of technology, there has been an exponential growth in the volume of data that is continuously being generated by several applications in domains such as finance, networking, security. Examples of such continuously streaming data include internet traffic data, sensor readings, tweets, stock market data, telecommunication records. As a result, processing and analyzing data to derive useful insights from them in real time is becoming increasingly important.

The goal of my research is to propose techniques to effectively find aggregates and patterns from massive distributed data stream in real time. In many real world applications, there may be specific user requirements for analyzing data. We consider three different user requirements for our work - Sliding window, Distributed data stream, and a Union of historical and streaming data.

We aim to address the following problems in our research : First, we present a detailed experimental evaluation of streaming algorithms over sliding window for distinct counting, which is a fundamental aggregation problem widely applied in database query optimization and network monitoring. Next, we present the first communication-efficient distributed algorithm for tracking persistent items in a distributed data stream over both infinite and sliding window. We present theoretical analysis on communication cost and accuracy, and provide experimental results to validate the guarantees. Finally, we present the design and evaluation of a low cost algorithm that identifies quantiles from a union of historical and streaming data with improved accuracy.

Copyright Owner

Sneha Aman Singh

Language

en

File Format

application/pdf

File Size

113 pages

Share

COinS