Scalable and Dynamic Regeneration of Big Data Volumes

Thumbnail Image
Date
2018-01-01
Authors
Sanghi, Anupam
Sood, Raghav
Haritsa, Jayant
Tirthapura, Srikanta
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Research Projects
Organizational Units
Organizational Unit
Electrical and Computer Engineering

The Department of Electrical and Computer Engineering (ECpE) contains two focuses. The focus on Electrical Engineering teaches students in the fields of control systems, electromagnetics and non-destructive evaluation, microelectronics, electric power & energy systems, and the like. The Computer Engineering focus teaches in the fields of software systems, embedded systems, networking, information security, computer architecture, etc.

History
The Department of Electrical Engineering was formed in 1909 from the division of the Department of Physics and Electrical Engineering. In 1985 its name changed to Department of Electrical Engineering and Computer Engineering. In 1995 it became the Department of Electrical and Computer Engineering.

Dates of Existence
1909-present

Historical Names

  • Department of Electrical Engineering (1909-1985)
  • Department of Electrical Engineering and Computer Engineering (1985-1995)

Related Units

Journal Issue
Is Version Of
Versions
Series
Department
Electrical and Computer Engineering
Abstract

A core requirement of database engine testing is the ability to create synthetic versions of the customer’s data warehouse at the vendor site. A rich body of work exists on synthetic database regeneration, but suffers critical limitations with regard to: (a) maintaining statistical fidelity to the client’s query processing, and/or (b) scaling to large data volumes. In this paper, we present HYDRA, a workload-dependent database regenerator that leverages a declarative approach to data regeneration to assure volumetric similarity, a crucial aspect of statistical fidelity, and materially improves on the prior art by adding scale, dynamism and functionality. Specifically, Hydra uses an optimized linear programming (LP) formulation based on a novel regionpartitioning approach. This spatial strategy drastically reduces the LP complexity, enabling it to handle query workloads on which contemporary techniques fail. Second, Hydra incorporates deterministic post-LP processing algorithms that provide high efficiency and improved accuracy. Third, Hydra introduces the concept of dynamic regeneration by constructing a minuscule database summary that can on-the-fly regenerate databases of arbitrary size during query execution, while obeying volumetric specifications derived from the query workload. A detailed experimental evaluation on standard OLAP benchmarks demonstrates that Hydra can efficiently and dynamically regenerate large warehouses that accurately mimic the desired statistical characteristics.

Comments

This article is published as Sanghi, Anupam, Raghav Sood, Jayant Haritsa, and Srikanta Tirthapura. "Scalable and Dynamic Regeneration of Big Data Volumes," in Proceedings of the 21st International Conference on Extending Database Technology (EDBT), Vienna, Austria, March 26-29, 2018. DOI: 10.5441/002/edbt.2018.27. Posted with permission.

Description
Keywords
Citation
DOI
Copyright
Mon Jan 01 00:00:00 UTC 2018