Campus Units
Electrical and Computer Engineering
Document Type
Article
Publication Version
Accepted Manuscript
Publication Date
7-2016
Journal or Book Title
Data Mining and Knowledge Discovery
Volume
30
Issue
4
First Page
797
Last Page
818
DOI
10.1007/s10618-015-0438-6
Abstract
We consider online mining of correlated heavy-hitters (CHH) from a data stream. Given a stream of two-dimensional data, a correlated aggregate query first extracts a substream by applying a predicate along a primary dimension, and then computes an aggregate along a secondary dimension. Prior work on identifying heavy-hitters in streams has almost exclusively focused on identifying heavy-hitters on a single dimensional stream, and these yield little insight into the properties of heavy-hitters along other dimensions. In typical applications however, an analyst is interested not only in identifying heavy-hitters, but also in understanding further properties such as: what other items appear frequently along with a heavy-hitter, or what is the frequency distribution of items that appear along with the heavy-hitters. We consider queries of the following form: “In a stream S of (x, y) tuples, on the substream H of all x values that are heavy-hitters, maintain those y values that occur frequently with the x values in H”. We call this problem as CHH. We formulate an approximate formulation of CHH identification, and present an algorithm for tracking CHHs on a data stream. The algorithm is easy to implement and uses workspace much smaller than the stream itself. We present provable guarantees on the maximum error, as well as detailed experimental results that demonstrate the space-accuracy trade-off.
Copyright Owner
The Authors
Copyright Date
2016
Language
en
File Format
application/pdf
Recommended Citation
Lahiri, Bibudh; Mukherjee, Arko Provo; and Tirthapura, Srikanta, "Identifying correlated heavy-hitters in a two-dimensional data stream" (2016). Electrical and Computer Engineering Publications. 148.
https://lib.dr.iastate.edu/ece_pubs/148
Comments
This is an Accepted Manuscript of an article published by Taylor & Francis as Lahiri, Bibudh, Arko Provo Mukherjee, and Srikanta Tirthapura. "Identifying correlated heavy-hitters in a two-dimensional data stream." Data Mining and Knowledge Discovery 30, no. 4 (2016): 797-818. Available online: https://doi.org/10.1007/s10618-015-0438-6. Posted with permission.