Degree Type

Thesis

Date of Award

2013

Degree Name

Master of Science

Department

Electrical and Computer Engineering

First Advisor

Srikanta Tirthapura

Abstract

We consider continuous maintenance of a random sample of distinct elements from a massive data stream, whose input elements are observed at multiple distributed sites that communicate via a central coordinator. At any point, when a query is received at the coordinator, it responds with a random sample from the set of all distinct elements observed at the different sites so far. We present the first algorithms for distinct random sampling on distributed streams. We also present a lower bound on the expected number of messages that must be transmitted by any distributed algorithm, showing that our algorithm is message optimal to within a factor of four. We present extensions to sliding windows, and detailed experimental results showing the performance of our algorithm on real-world data sets.

Copyright Owner

Yung-Yu Chung

Language

en

File Format

application/pdf

File Size

41 pages

Share

COinS