Campus Units

Chemistry, Ames Laboratory

Document Type

Article

Publication Version

Published Version

Publication Date

2-2014

Journal or Book Title

Journal of Chemical Theory and Computation

Volume

10

Issue

3

First Page

908

Last Page

912

DOI

10.1021/ct4010596

Abstract

Increasingly, modern computer systems comprise a multicore general-purpose processor augmented with a number of special purpose devices or accelerators connected via an external interface such as a PCI bus. The NVIDIA Kepler Graphical Processing Unit (GPU) and the Intel Phi are two examples of such accelerators. Accelerators offer peak performances that can be well above those of the host processor. How to exploit this heterogeneous environment for legacy application codes is not, however, straightforward. This paper considers how matrix operations in typical quantum chemical calculations can be migrated to the GPU and Phi systems. Double precision general matrix multiply operations are endemic in electronic structure calculations, especially methods that include electron correlation, such as density functional theory, second order perturbation theory, and coupled cluster theory. The use of approaches that automatically determine whether to use the host or an accelerator, based on problem size, is explored, with computations that are occurring on the accelerator and/or the host. For data-transfers over PCI-e, the GPU provides the best overall performance for data sizes up to 4096 MB with consistent upload and download rates between 5–5.6 GB/s and 5.4–6.3 GB/s, respectively. The GPU outperforms the Phi for both square and nonsquare matrix multiplications.

Comments

Reprinted (adapted) with permission from Journal of Chemical Theory and Computation 10 (2014): 908, doi:10.1021/ct4010596. Copyright 2014 American Chemical Society.

Copyright Owner

American Chemical Society

Language

en

File Format

application/pdf

Included in

Chemistry Commons

Share

COinS