Degree Type
Thesis
Date of Award
1-1-2003
Degree Name
Master of Science
Department
Electrical and Computer Engineering
Major
Computer Engineering
Abstract
Control hazards caused by conditional branches are one of the biggest obstacles to achieving performance in out-of-order superscalar processors. Branch prediction techniques help alleviate the penalties associated with branch instructions, but still exhibit mis-prediction rates due to their functioning principle. A new paradigm, Branch decoupled architectures, has been proposed as an alternative to reduce branch stalls. This paradigm supported by an accompanying compiler, has a two-execution-unit processor-a branch processor and a program processor. A program is decoupled during compile time into two instruction streams and executed on the branch decoupled processor. The objective of the decoupling process is to have the branch processor solve branch conditions and precompute branch target addresses in advance for the program processor. This thesis presents three contributions. An algorithm based on graph bi-partitioning and scheduling, used by the compiler for decoupling the program's instruction stream into two streams is presented. This technique attempts to achieve maximal decoupling and at the same time attempts to reduce interaction between the two streams. Maximal decoupling allows both processors to run as independently as possible thereby extracting maximum benefit from the branch decoupled architecture paradigm. Application of the decoupling algorithm has been shown to result in 48.6% and 38.1% of the instructions on the average being executed on the branch and program processors. Simulations show a performance improvement of 7.7% and 5.5% on the average for integer and floating point benchmarks respectively. It then presents a toolchain consisting of a compiler, binary utilities (assembler, linker, loader) and associated libraries that has been retargeted to the branch decoupled architecture platform. Finally an overview of an out-of-order execution-driven superscalar processor simulator that has been developed for simulating the branch decoupled architecture is presented.
DOI
https://doi.org/10.31274/rtd-20200803-327
Copyright Owner
Pramod Bhanu Ramarao
Copyright Date
2003
Language
en
OCLC Number
54938077
File Format
application/pdf
File Size
75 pages
Recommended Citation
Ramarao, Pramod Bhanu, "A hybrid partitioning and scheduling technique for branch decoupling" (2003). Retrospective Theses and Dissertations. 20004.
https://lib.dr.iastate.edu/rtd/20004