Degree Type


Date of Award


Degree Name

Master of Science


Electrical and Computer Engineering


Computer Engineering


Control hazards caused by conditional branches are one of the biggest obstacles to achieving performance in out-of-order superscalar processors. Branch prediction techniques help alleviate the penalties associated with branch instructions, but still exhibit mis-prediction rates due to their functioning principle. A new paradigm, Branch decoupled architectures, has been proposed as an alternative to reduce branch stalls. This paradigm supported by an accompanying compiler, has a two-execution-unit processor-a branch processor and a program processor. A program is decoupled during compile time into two instruction streams and executed on the branch decoupled processor. The objective of the decoupling process is to have the branch processor solve branch conditions and precompute branch target addresses in advance for the program processor. This thesis presents three contributions. An algorithm based on graph bi-partitioning and scheduling, used by the compiler for decoupling the program's instruction stream into two streams is presented. This technique attempts to achieve maximal decoupling and at the same time attempts to reduce interaction between the two streams. Maximal decoupling allows both processors to run as independently as possible thereby extracting maximum benefit from the branch decoupled architecture paradigm. Application of the decoupling algorithm has been shown to result in 48.6% and 38.1% of the instructions on the average being executed on the branch and program processors. Simulations show a performance improvement of 7.7% and 5.5% on the average for integer and floating point benchmarks respectively. It then presents a toolchain consisting of a compiler, binary utilities (assembler, linker, loader) and associated libraries that has been retargeted to the branch decoupled architecture platform. Finally an overview of an out-of-order execution-driven superscalar processor simulator that has been developed for simulating the branch decoupled architecture is presented.


Copyright Owner

Pramod Bhanu Ramarao



OCLC Number


File Format


File Size

75 pages