Degree Type
Dissertation
Date of Award
2015
Degree Name
Doctor of Philosophy
Department
Electrical and Computer Engineering
Major
Computer Engineering
First Advisor
Srikanta Tirthapura
Abstract
With the advancement of technology, there has been an exponential growth in the volume of data that is continuously being generated by several applications in domains such as finance, networking, security. Examples of such continuously streaming data include internet traffic data, sensor readings, tweets, stock market data, telecommunication records. As a result, processing and analyzing data to derive useful insights from them in real time is becoming increasingly important.
The goal of my research is to propose techniques to effectively find aggregates and patterns from massive distributed data stream in real time. In many real world applications, there may be specific user requirements for analyzing data. We consider three different user requirements for our work - Sliding window, Distributed data stream, and a Union of historical and streaming data.
We aim to address the following problems in our research : First, we present a detailed experimental evaluation of streaming algorithms over sliding window for distinct counting, which is a fundamental aggregation problem widely applied in database query optimization and network monitoring. Next, we present the first communication-efficient distributed algorithm for tracking persistent items in a distributed data stream over both infinite and sliding window. We present theoretical analysis on communication cost and accuracy, and provide experimental results to validate the guarantees. Finally, we present the design and evaluation of a low cost algorithm that identifies quantiles from a union of historical and streaming data with improved accuracy.
DOI
https://doi.org/10.31274/etd-180810-4505
Copyright Owner
Sneha Aman Singh
Copyright Date
2015
Language
en
File Format
application/pdf
File Size
113 pages
Recommended Citation
Singh, Sneha Aman, "Techniques for online analysis of large distributed data" (2015). Graduate Theses and Dissertations. 14907.
https://lib.dr.iastate.edu/etd/14907