A hybrid partitioning and scheduling technique for branch decoupling

Thumbnail Image
Date
2003-01-01
Authors
Ramarao, Pramod
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Electrical and Computer Engineering
Abstract

Control hazards caused by conditional branches are one of the biggest obstacles to achieving performance in out-of-order superscalar processors. Branch prediction techniques help alleviate the penalties associated with branch instructions, but still exhibit mis-prediction rates due to their functioning principle. A new paradigm, Branch decoupled architectures, has been proposed as an alternative to reduce branch stalls. This paradigm supported by an accompanying compiler, has a two-execution-unit processor-a branch processor and a program processor. A program is decoupled during compile time into two instruction streams and executed on the branch decoupled processor. The objective of the decoupling process is to have the branch processor solve branch conditions and precompute branch target addresses in advance for the program processor. This thesis presents three contributions. An algorithm based on graph bi-partitioning and scheduling, used by the compiler for decoupling the program's instruction stream into two streams is presented. This technique attempts to achieve maximal decoupling and at the same time attempts to reduce interaction between the two streams. Maximal decoupling allows both processors to run as independently as possible thereby extracting maximum benefit from the branch decoupled architecture paradigm. Application of the decoupling algorithm has been shown to result in 48.6% and 38.1% of the instructions on the average being executed on the branch and program processors. Simulations show a performance improvement of 7.7% and 5.5% on the average for integer and floating point benchmarks respectively. It then presents a toolchain consisting of a compiler, binary utilities (assembler, linker, loader) and associated libraries that has been retargeted to the branch decoupled architecture platform. Finally an overview of an out-of-order execution-driven superscalar processor simulator that has been developed for simulating the branch decoupled architecture is presented.

Comments
Description
Keywords
Citation
Source
Copyright
Wed Jan 01 00:00:00 UTC 2003