Degree Type

Dissertation

Date of Award

2014

Degree Name

Doctor of Philosophy

Department

Electrical and Computer Engineering

First Advisor

Zhao Zhang

Abstract

Reliability of memory systems is increasingly a concern as memory density increases, the cell dimension shrinks and new memory technologies move close to commercial use. Meanwhile, memory power efficiency has become another first-order consideration in memory system design. Conventional reliability scheme uses ECC (Error Correcting Code) and EDC (Error Detecting Code) to support error correction and detection in memory systems, putting a rigid constraint on memory organizations and incurring a significant overhead regarding the power efficiency and area cost.

This dissertation studies energy-efficient and cost-effective reliability design on both cache and main memory systems. It first explores the generic approach called embedded ECC in main memory systems to provide a low-cost and efficient reliability design. A scheme called E3CC (Enhanced Embedded ECC) is proposed for sub-ranked low-power memories to alleviate the concern on reliability. In the design, it proposes a novel BCRM (Biased Chinese Remainder Mapping) to resolve the address mapping issue in page-interleaving scheme. The proposed BCRM scheme provides an opportunity for building flexible reliability system, which favors the consumer-level computers to save power consumption.

Within the proposed E3CC scheme, we further explore address mapping schemes at DRAM device level to provide SEP (Selective Error Protection). We explore a group of address mapping schemes at DRAM device level to map memory requests to their designated regions. All the proposed address mapping schemes are based on modulo operation. They will be proven, in this thesis, to be efficient, flexible and promising to various scenarios to favor system requirements.

Additionally, we propose Free ECC reliability design for compressed cache schemes. It utilizes the unused fragments in compressed cache to store ECC. Such a design not only reduces the chip overhead but also improves cache utilization and power efficiency. In the design, we propose an efficient convergent cache allocation scheme to organize the compressed data blocks more effectively than existing schemes. This new design makes compressed cache an increasingly viable choice for processors with requirements of high reliability.

Furthermore, we propose a novel, system-level scheme of memory error detection based on memory integrity check, called MemGuard, to detect memory errors. It uses memory log hashes to ensure, by strong probability, that memory read log and write log match with each other. It is much stronger than conventional protection in error detection and incurs little hardware cost, no storage overhead and little power overhead. It puts no constraints on memory organization and no major complication to processor design and operating system design. In the thesis, we prove that the MemGuard reliability design is simple, robust and efficient.

DOI

https://doi.org/10.31274/etd-180810-1056

Copyright Owner

Long Chen

Language

en

File Format

application/pdf

File Size

143 pages

Share

COinS