Distinct random sampling from a distributed stream

Thumbnail Image
Date
2013-01-01
Authors
Chung, Yung-Yu
Major Professor
Advisor
Srikanta Tirthapura
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Electrical and Computer Engineering
Abstract

We consider continuous maintenance of a random sample of distinct elements from a massive data stream, whose input elements are observed at multiple distributed sites that communicate via a central coordinator. At any point, when a query is received at the coordinator, it responds with a random sample from the set of all distinct elements observed at the different sites so far. We present the first algorithms for distinct random sampling on distributed streams. We also present a lower bound on the expected number of messages that must be transmitted by any distributed algorithm, showing that our algorithm is message optimal to within a factor of four. We present extensions to sliding windows, and detailed experimental results showing the performance of our algorithm on real-world data sets.

Comments
Description
Keywords
Citation
Source
Copyright
Tue Jan 01 00:00:00 UTC 2013