Optimizing parallel sequence alignment sorting and epistasis detection, and parallel Fortran application resilience

Thumbnail Image
Date
2020-01-01
Authors
Weeks, Nathan
Major Professor
Advisor
Glenn R Luecke
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of
Versions
Series
Department
Computer Science
Abstract

This dissertation comprises published or accepted papers encompassing two areas in High Performance Computing research: optimization and parallelization of bioinformatics applications, and fault tolerance in parallel Fortran applications.

The bioinformatics application optimization papers examine the computationally-expensive problems of epistasis detection in quantitative-trait genome-wide association studies (GWAS) and sequence alignment sorting.

First, epiSNP, an application for identifying pairwise epistasis (genetic marker interactions), is subject to performance analysis and subsequent algorithmic and data structure optimizations, resulting in a ~12X speedup vs. the original (serial) application. Combined with distributed- and shared-memory techniques for dynamically load balancing pairwise operations across processes, a 38.43X speedup over the original parallel implementation (EPISNPmpi) is achieved on 126 nodes (each with 2 Intel Xeon Phi coprocessors) of the TACC Stampede supercomputer.

For sequence-alignment sorting, optimizations to the popular open-source application SAMtools are described. These include more efficient data structures to reduce memory-management overhead, an improved external sorting implementation that reduces I/O, and the use of OpenMP tasks to better load balance compression, decompression, and sorting. The optimizations resulted in a 5.9X speedup for the benchmarked in-memory sort, and a 1.98X speedup for an external sort.

In the domain of High Performance Computing fault tolerance for parallel Fortran applications, the first paper surveys the landscape of HPC technologies and techniques for developing resilient Fortran applications that are parallelized using the Message Passing Interface (MPI). MPI fault tolerance extensions are categorized and analyzed for Fortran compatibility, and issues pertaining to the use of Fortran I/O and MPI I/O for checkpoint/restart are discussed.

The final paper both proposes changes to the Fortran standard to make its recent facilities for handling failed images (processes) more useful to and usable by application programmers, and introduces a prototype implementation that demonstrates the proposed semantics.

Comments
Description
Keywords
Citation
Source
Copyright
Tue Dec 01 00:00:00 UTC 2020