Campus Units

Electrical and Computer Engineering

Document Type

Conference Proceeding

Conference

21st International Conference on Extending Database Technology (EDBT)

Publication Version

Published Version

Link to Published Version

https://dx.doi.org/10.5441/002/edbt.2018.27

Publication Date

2018

Journal or Book Title

Proceedings of the 21st International Conference on Extending Database Technology (EDBT)

First Page

301

Last Page

312

DOI

10.5441/002/edbt.2018.27

Conference Date

March 26-29, 2018

City

Vienna, Austria

Abstract

A core requirement of database engine testing is the ability to create synthetic versions of the customer’s data warehouse at the vendor site. A rich body of work exists on synthetic database regeneration, but suffers critical limitations with regard to: (a) maintaining statistical fidelity to the client’s query processing, and/or (b) scaling to large data volumes. In this paper, we present HYDRA, a workload-dependent database regenerator that leverages a declarative approach to data regeneration to assure volumetric similarity, a crucial aspect of statistical fidelity, and materially improves on the prior art by adding scale, dynamism and functionality. Specifically, Hydra uses an optimized linear programming (LP) formulation based on a novel regionpartitioning approach. This spatial strategy drastically reduces the LP complexity, enabling it to handle query workloads on which contemporary techniques fail. Second, Hydra incorporates deterministic post-LP processing algorithms that provide high efficiency and improved accuracy. Third, Hydra introduces the concept of dynamic regeneration by constructing a minuscule database summary that can on-the-fly regenerate databases of arbitrary size during query execution, while obeying volumetric specifications derived from the query workload. A detailed experimental evaluation on standard OLAP benchmarks demonstrates that Hydra can efficiently and dynamically regenerate large warehouses that accurately mimic the desired statistical characteristics.

Comments

This article is published as Sanghi, Anupam, Raghav Sood, Jayant Haritsa, and Srikanta Tirthapura. "Scalable and Dynamic Regeneration of Big Data Volumes," in Proceedings of the 21st International Conference on Extending Database Technology (EDBT), Vienna, Austria, March 26-29, 2018. DOI: 10.5441/002/edbt.2018.27. Posted with permission.

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Copyright Owner

The Authors

Language

en

File Format

application/pdf

Published Version

Share

Article Location

 
COinS