Technical Report Number
Data, Hardware, Computing Methodologies
This paper discusses the importance of memory access optimizations which are shown to be highly effective on the MasPar architecture. The study is based on two MasPar machines, a 16K-processor MP-1 and a 4K-processor MP-2. A software pipelining technique overlaps memory accesses with computation and/or communication. Another optimization, called the register window technique reduces the number of loads in a loop. These techniques are evaluated using three parallel matrix multiplication algorithms on both the MasPar machines. The matrix multiplication study shows that for a highly computation intensive problem, reducing the interprocessor communication can become a secondary issue compared to memory access optimization. Also, it is shown that memory access optimizations can play a more important role than the choice of a superior parallel algorithm. Keywords: load/store architecture, memory accesses, matrix multiplication, parallel programming.