Campus Units

Electrical and Computer Engineering, Computer Science

Document Type

Conference Proceeding

Conference

25th International Symposium on Distributed Computing (DISC 2011)

Publication Version

Accepted Manuscript

Link to Published Version

https://doi.org/10.1007/978-3-642-24100-0_27

Publication Date

2011

Journal or Book Title

Lecture Notes in Computer Science

Volume

6950

First Page

283

Last Page

297

DOI

10.1007/978-3-642-24100-0_27

Conference Date

September 20-22, 2011

City

Rome, Italy

Abstract

We give an improved algorithm for drawing a random sample from a large data stream when the input elements are distributed across multiple sites which communicate via a central coordinator. At any point in time the set of elements held by the coordinator represent a uniform random sample from the set of all the elements observed so far. When compared with prior work, our algorithms asymptotically improve the total number of messages sent in the system as well as the computation required of the coordinator. We also present a matching lower bound, showing that our protocol sends the optimal number of messages up to a constant factor with large probability. As a byproduct, we obtain an improved algorithm for finding the heavy hitters across multiple distributed sites.

Comments

This is a manuscript of a proceeding published as Tirthapura, Srikanta, and David P. Woodruff. "Optimal random sampling from distributed streams revisited," 25th International Symposium on Distributed Computing (DISC 2011), Lecture Notes in Computer Science 6950 (2011): 283-297. DOI: 10.1007/978-3-642-24100-0_27. Posted with permission.

Copyright Owner

Springer-Verlag Berlin Heidelberg

Language

en

File Format

application/pdf

Published Version

Share

Article Location

 
COinS