Campus Units

Electrical and Computer Engineering

Document Type

Conference Proceeding

Conference

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Publication Version

Accepted Manuscript

Link to Published Version

https://doi.org/10.1109/ASAP49362.2020.00016

Publication Date

2020

Journal or Book Title

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)

First Page

37

Last Page

44

DOI

10.1109/ASAP49362.2020.00016

Conference Title

2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP)

Conference Date

July 6-8, 2020

City

Manchester, United Kingdom

Abstract

Due to the increase in the use of large-sized Deep Neural Networks (DNNs) over the years, specialized hardware accelerators such as Tensor Processing Unit and Eyeriss have been developed to accelerate the forward pass of the network. The essential component of these devices is an array processor which is composed of multiple individual compute units for efficiently executing Multiplication and Accumulation (MAC) operation. As the size of this array limits the amount of DNN processing of a single layer, the computation is performed in several batches serially leading to extra compute cycles along both the axes. In practice, due to the mismatch between matrix and array sizes, the computation does not map on the array exactly. In this work, we address the issue of minimizing processing cycles on the array by adjusting the DNN model parameters by using a structured hardware array dependent optimization. We introduce two techniques in this paper: Array Aware Training (AAT) for efficient training and Array Aware Pruning (AAP) for efficient inference. Weight pruning is an approach to remove redundant parameters in the network to decrease the size of the network. The key idea behind pruning in this paper is to adjust the model parameters (the weight matrix) so that the array is fully utilized in each computation batch. Our goal is to compress the model based on the size of the array so as to reduce the number of computation cycles. We observe that both the proposed techniques results into similar accuracy as the original network while saving a significant number of processing cycles (75%).

Comments

This is a manuscript of a proceeding published as Chitty-Venkata, Krishna Teja, and Arun K. Somani. "Array Aware Training/Pruning: Methods for Efficient Forward Propagation on Array-based Neural Network Accelerators." In 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP) (2020): 37-44. DOI: 10.1109/ASAP49362.2020.00016. Posted with permission.

Rights

© 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Copyright Owner

IEEE

Language

en

File Format

application/pdf

Published Version

Share

Article Location

 
COinS