Degree Type

Dissertation

Date of Award

2013

Degree Name

Doctor of Philosophy

Department

Statistics

Major

Bioinformatics and Computational Biology

First Advisor

Heike Hofmann

Second Advisor

Dianne H. Cook

Abstract

When identifying best practices for multistep processes involving data analysis, it is fre- quently the case that the data scientist is asked to wear many hats simultaneouly: developer, programmer, statistician, graphic designer, writer, administrator. Although many scientists address these roles with great success, it is often at the expense of reproducibility, scalability, and organizational knowledge. The process of formalizing each step of the process creates op- portunity to apply lessons learned and proven tools from multiple disciplines to optimize each step of the transformation from raw data to usable output. This modular approach allows organizations to mix off the shelf technical solutions with custom, swap out components for flexibility and minimize rework.

The primary focus of this dissertation is to extend the conceptualization of pipeline to include methods drawn from human computer interaction, exploratory data analysis, interactive graphics, and reproducible research. We describe application to three distinct user groups: (1) a general audience of readers (2) biologists involved in metabolomics analysis (3) analysts working in a public sector regulatory environment. The resulting technical tools are implemented in the R packages ggparallel, chromatoplotsGUI, dataFormats, and CVBreports.

Our analysis shows that these tools facilitate a positive transformative effect on the quality of communication between stakeholders. Specifically we see that the common angles plot pre- sented in ggparallel reduces the lie factor, chromatoplotsGUI enables display of metabolomic data rapidly and with a level of detail that facilitates development of the underlying analysis en- gine and the methods of dataFormats and CVBreports enable significantly reduced turnaround times for preliminary data assesment.

Copyright Owner

Marie C. Vendettuoli

Language

en

File Format

application/pdf

File Size

130 pages

Share

COinS