Degree Type

Thesis

Date of Award

2020

Degree Name

Doctor of Philosophy

Department

Computer Science

Major

Computer Science

First Advisor

Glenn R Luecke

Abstract

This dissertation comprises published or accepted papers encompassing two areas in High Performance Computing research: optimization and parallelization of bioinformatics applications, and fault tolerance in parallel Fortran applications.

The bioinformatics application optimization papers examine the computationally-expensive problems of epistasis detection in quantitative-trait genome-wide association studies (GWAS) and sequence alignment sorting.

First, epiSNP, an application for identifying pairwise epistasis (genetic marker interactions), is subject to performance analysis and subsequent algorithmic and data structure optimizations, resulting in a ~12X speedup vs. the original (serial) application. Combined with distributed- and shared-memory techniques for dynamically load balancing pairwise operations across processes, a 38.43X speedup over the original parallel implementation (EPISNPmpi) is achieved on 126 nodes (each with 2 Intel Xeon Phi coprocessors) of the TACC Stampede supercomputer.

For sequence-alignment sorting, optimizations to the popular open-source application SAMtools are described. These include more efficient data structures to reduce memory-management overhead, an improved external sorting implementation that reduces I/O, and the use of OpenMP tasks to better load balance compression, decompression, and sorting. The optimizations resulted in a 5.9X speedup for the benchmarked in-memory sort, and a 1.98X speedup for an external sort.

In the domain of High Performance Computing fault tolerance for parallel Fortran applications, the first paper surveys the landscape of HPC technologies and techniques for developing resilient Fortran applications that are parallelized using the Message Passing Interface (MPI). MPI fault tolerance extensions are categorized and analyzed for Fortran compatibility, and issues pertaining to the use of Fortran I/O and MPI I/O for checkpoint/restart are discussed.

The final paper both proposes changes to the Fortran standard to make its recent facilities for handling failed images (processes) more useful to and usable by application programmers, and introduces a prototype implementation that demonstrates the proposed semantics.

DOI

https://doi.org/10.31274/etd-20210114-160

Copyright Owner

Nathan Weeks

Language

en

File Format

application/pdf

File Size

151 pages

Available for download on Saturday, January 07, 2023

Share

COinS