Federated Computing for the Masses--Aggregating Resources to Tackle Large-Scale Engineering Problems

Javier Diaz-Montes, Rutgers University
Yu Xie, Iowa State University
Ivan Rodero, Iowa State University
Jaroslaw Zola, Rutgers University
Baskar Ganapathysubramanian, Iowa State University
Manish Parashar, Rutgers University

This is a manuscript of an article published as Diaz-Montes, Javier, Yu Xie, Ivan Rodero, Jaroslaw Zola, Baskar Ganapathysubramanian, and Manish Parashar. "Federated Computing for the Masses--Aggregating Resources to Tackle Large-Scale Engineering Problems." Computing in Science & Engineering 16, no. 4 (2014): 62-72. DOI:10.1109/MCSE.2013.134. Posted with permission.

Abstract

The complexity of many problems in science and engineering requires computational capacity exceeding what the average user can expect from a single computational center. While many of these problems can be viewed as a set of independent tasks, their collective complexity easily requires millions of core-hours on any high-power computing (HPC) resource, and throughput that can't be sustained by a single, multiuser queuing system. An exploration of the use of aggregated HPC resources to solve large-scale engineering problems shows that it's possible to build a computational federation that's easy for end users to implement, and is elastic, resilient, and scalable. Here, the authors argue that the fusion of federated computing and real-life engineering problems can be brought to the average user if relevant middleware is provided. They report on the use of federation of 10 distributed heterogeneous HPC resources to perform a large-scale interrogation of the parameter space in the microscale fluid flow problem.