Date of Award
Doctor of Philosophy
Electrical and Computer Engineering
Advances in the fabrication technology have been a major driving force in the unprecedented increase in computing capabilities over the last several decades. Despite huge reductions in the switching energy of the transistors, two major issues have emerged with decreasing fabrication technology scales. They are: 1) increased impact of process, voltage, and temperature (PVT) variation on transistor performance, and 2) increased susceptibility of transistors to soft errors induced by high energy particles. In presence of PVT variation, as transistor sizes continue to decrease, the design margins used to guarantee correct operation in the presence of worst-case scenarios have been increasing.
Systems run at a clock frequency, which is determined by accounting the worst-case timing paths, operating conditions, and process variations. Timing speculation based reliable and aggressive clocking advocates going beyond worst-case limits to achieve best performance while not avoiding, but detecting and correcting a modest number of timing errors. Such design methodology exploits the fact that timing critical paths are rarely exercised in a design, and typical execution happens much faster than the timing requirements dictated by worst-case scenarios. Better-than-worst-case design methodology is advocated by several recent research pursuits, which propose to exploit in-built fault tolerance mechanisms to enhance computer system performance. Recent works have also shown that the performance lose due to over provisioning base on worst-case design margins is upward of 20\% in terms operating frequency and upward of 50\% in terms of power efficiency.
The threat of soft error induced system failure in computing systems has become more prominent as we adopt ultra-deep submicron process technologies. With respect to soft error susceptibility, decreasing transistor geometries lower the energy threshold needed by high-energy particles to induce errors. As this trend continues, the need for fault tolerance mechanisms to counteract this effect has moved from a nice to have, to be a requirement in current and future systems. In this dissertation, RAKSHA (meaning to protect and save in Sanskrit), we take a multidimensional look at the challenges of system design built with scaled-technologies using high integrity techniques.
In RAKSHA, to mitigate soft errors, we propose lightweight high-integrity mechanisms as basic system building blocks which allow system to offer performance levels comparable to a non-fault tolerant system. In addition, we also propose to effectively exploit and use the availability of fault tolerant mechanisms to allow and tolerate data-dependent failures, thus setting systems to operate at typical case circuit delays and enhance system performance. We also propose the use of novel high-integrity cells for increasing system energy efficiency and also potentially increasing system security by combating power-analysis-based side channel attacks. Such an approach allows balancing of performance, power, and security with no further overhead over the resources needed to incorporate fault tolerance. Using our framework, instead of designing circuits to meet worst-case requirements, circuits can be designed to meet typical-case requirements.
In RAKSHA, we propose two efficient soft error mitigation schemes, namely Soft Error Mitigation (SEM) and Soft and Timing Error Mitigation (STEM), using the approach of multiple clocking of data for protecting combinational logic blocks from soft errors. Our first technique, SEM, based on distributed and temporal voting of three registers, unloads the soft error detection overhead from the critical path of the systems. SEM is also capable of ignoring false errors and recovers from soft errors using in-situ fast recovery avoiding recomputation. Our second technique, STEM, while tolerating soft errors, adds timing error detection capability to guarantee reliable execution in aggressively clocked designs that enhance system performance by operating beyond worst-case clock frequency. We also present a specialized low overhead clock phase management scheme that ably supports our proposed techniques. Timing annotated gate level simulations, using 45nm libraries, of a pipelined adder-multiplier and DLX processor show that both our techniques achieve near 100% fault coverage. For DLX processor, even under severe fault injection campaigns, SEM achieves an average performance improvement of 26.58% over a conventional triple modular redundancy voter based soft error mitigation scheme, while STEM outperforms SEM by 27.42%. We refer to systems built with SEM and STEM cells as reliable and aggressive systems.
Energy consumption minimization in computing systems has attracted a great deal of attention and has also become critical due to battery life considerations and environmental concerns. To address this problem, many task scheduling algorithms are developed using dynamic voltage and frequency scaling (DVFS). Majority of these algorithms involve two passes: schedule generation and slack reclamation. Using this approach, linear combination of frequencies has been proposed to achieve near optimal energy for systems operating with discrete and traditional voltage frequency pairs. In RAKSHA, we propose a new slack reclamation algorithm, aggressive dynamic and voltage scaling (ADVFS), using reliable and aggressive systems. ADVFS exploits the enhanced voltage frequency spectrum offered by reliable and aggressive designs for improving energy efficiency. Formal proofs are provided to show that optimal energy for reliable and aggressive designs is either achieved by using single frequency or by linear combination of frequencies. ADVFS has been evaluated using random task graphs and our results show 18% reduction in energy when compared with continuous DVFS and over more than 33% when compared with scheme using linear combination of traditional voltage frequency pairs.
Recent events have indicated that attackers are banking on side-channel attacks, such as differential power analysis (DPA) and correlation power analysis (CPA), to exploit information leaks from physical devices. Random dynamic voltage frequency scaling (RDVFS) has been proposed to prevent such attacks and has very little area, power, and performance overheads. But due to the one-to-one mapping present between voltage and frequency of DVFS voltage-frequency pairs, RDVFS cannot prevent power attacks. In RAKSHA, we propose a novel countermeasure that uses reliable and aggressive designs to break this one-to-one mapping. Our experiments show that our technique significantly reduces the correlation for the actual key and also reduces the risk of power attacks by increasing the probability for incorrect keys to exhibit maximum correlation. Moreover, our scheme also enables systems to operate beyond the worst-case estimates to offer improved power and performance benefits. For the experiments conducted on AES S-box implemented using 45nm CMOS technology, our approach has increased performance by 22% over the worst-case estimates. Also, it has decreased the correlation for the correct key by an order and has increased the probability by almost 3.5X times for wrong keys when compared with the original key to exhibit maximum correlation. Overall, RAKSHA offers a new way to balance the intricate interplay between various design constraints for the systems designed using small scaled-technologies.
Naga Durga Prasad Avirneni
Avirneni, Naga Durga Prasad, "RAKSHA:Reliable and Aggressive frameworK for System design using High-integrity Approaches" (2012). Graduate Theses and Dissertations. 12831.