Campus Units

Electrical and Computer Engineering, Mathematics

Document Type

Article

Publication Version

Submitted Manuscript

Publication Date

2020

Journal or Book Title

arXiv

Abstract

Distributed matrix computations over large clusters can suffer from the problem of slow or failed worker nodes (called stragglers) which can dominate the overall job execution time. Coded computation utilizes concepts from erasure coding to mitigate the effect of stragglers by running 'coded' copies of tasks comprising a job; stragglers are typically treated as erasures. While this is useful, there are issues with applying, e.g., MDS codes in a straightforward manner. Several practical matrix computation scenarios involve sparse matrices. MDS codes typically require dense linear combinations of submatrices of the original matrices which destroy their inherent sparsity. This is problematic as it results in significantly higher worker computation times. Moreover, treating slow nodes as erasures ignores the potentially useful partial computations performed by them. Furthermore, some MDS techniques also suffer from significant numerical stability issues. In this work we present schemes that allow us to leverage partial computation by stragglers while imposing constraints on the level of coding that is required in generating the encoded submatrices. This significantly reduces the worker computation time as compared to previous approaches and results in improved numerical stability in the decoding process. Exhaustive numerical experiments on Amazon Web Services (AWS) clusters support our findings.

Comments

This is a pre-print of the article Das, Anindya Bijoy, and Aditya Ramamoorthy. "Coded sparse matrix computation schemes that leverage partial stragglers." arXiv preprint arXiv:2012.06065 (2020). Posted with permission.

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Copyright Owner

The Author(s)

Language

en

File Format

application/pdf

Published Version

Share

COinS