Workload-aware Scheduling Techniques for General Purpose Applications on Graphics Processing Units

Awatramani, Mihir

Workload-aware Scheduling Techniques for General Purpose Applications on Graphics Processing Units

File

Awatramani_iastate_0097E_16976.pdf (10.7 MB)

Date

2017-01-01

Authors

Awatramani, Mihir

Advisor

Diane Rover

Joseph Zambreno

Organizational Units

Organizational Unit

Electrical and Computer Engineering

Department

Electrical and Computer Engineering

Abstract

In the last decade, there has been a wide scale adoption of Graphics Processing Units (GPUs) as a co-processor for accelerating data-parallel general purpose applications. A primary driver of this adoption is that GPUs offer orders of magnitude higher floating point arithmetic throughput and memory bandwidth compared to their CPU counterparts. As GPU architectures are designed as throughput processors, they adopt a manycore architecture with 10 to 100s of cores, each with multiple vector processing pipelines. A significant amount of the die area is dedicated to floating point units, at the expense of not having hardware units used for memory latency hiding in conventional CPU architectures. The quintessential technique used for memory latency tolerance is exploiting data-level parallelism in the workload, and interleaving execution of multiple SIMD threads, to overlap the latency of threads waiting on data from memory with computation from other threads.

With each architecture generation, GPU architectures are providing an increasing amount of floating point throughput and memory bandwidth. Alongside, the architectures support an increasing number of simultaneously active threads. We envision that to continue making advancements in GPU computing, workload-aware scheduling techniques are required. In the GPU computing work flow, scheduling is performed at three levels - the system or chip level, the core level and the thread level. The work proposed in the research aims at designing novel workload aware scheduling techniques at each of the three levels of scheduling. We show that GPU computing workloads have significantly varying characteristics, and design techniques that monitor the hardware state to aide at each of the three levels of scheduling. Each technique is implemented in a cycle level GPU architecture simulator, and their effect on performance is analyzed against state of the art scheduling techniques used in GPU architectures.

Copyright

Sun Jan 01 00:00:00 UTC 2017

Collections

Theses and Dissertations

Full item page