Campus Units
Electrical and Computer Engineering
Document Type
Article
Publication Version
Published Version
Publication Date
2-9-2021
Journal or Book Title
PeerJ Computer Science
Volume
7
First Page
e359
DOI
10.7717/peerj-cs.359
Abstract
As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Such schemes may not detect multiple symbol errors arising due to faults in multiple devices and/or data-bus, address bus. In this article, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD)—a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs).
We develop a novel scheme that deploys 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a ×4 based DDR4 system. Simulation based experiments show that our scheme effectively guards against device, data-bus and address-bus errors only limited by the aliasing probability of the hash. Our novel design enabled us to achieve this without introducing additional READ latency. We need 19 chips per rank, 76 data bus-lines and additional hash-logic at the memory controller.
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.
Copyright Owner
Yeleswarapu and Somani
Copyright Date
2021
Language
en
File Format
application/pdf
Recommended Citation
Yeleswarapu, Ravikiran and Somani, Arun K., "Addressing multiple bit/symbol errors in DRAM subsystem" (2021). Electrical and Computer Engineering Publications. 224.
https://lib.dr.iastate.edu/ece_pubs/224
Included in
Computer and Systems Architecture Commons, Systems and Communications Commons, Theory and Algorithms Commons
Comments
This article is published as Yeleswarapu, Ravikiran, and Arun K. Somani. "Addressing multiple bit/symbol errors in DRAM subsystem." PeerJ Computer Science 7 (2020): e359. Posted with permission.