Degree Type


Date of Award


Degree Name

Doctor of Philosophy




Bioinformatics and Computational Biology

First Advisor

Heike Hofmann

Second Advisor

Dianne H. Cook


When identifying best practices for multistep processes involving data analysis, it is fre- quently the case that the data scientist is asked to wear many hats simultaneouly: developer, programmer, statistician, graphic designer, writer, administrator. Although many scientists address these roles with great success, it is often at the expense of reproducibility, scalability, and organizational knowledge. The process of formalizing each step of the process creates op- portunity to apply lessons learned and proven tools from multiple disciplines to optimize each step of the transformation from raw data to usable output. This modular approach allows organizations to mix off the shelf technical solutions with custom, swap out components for flexibility and minimize rework.

The primary focus of this dissertation is to extend the conceptualization of pipeline to include methods drawn from human computer interaction, exploratory data analysis, interactive graphics, and reproducible research. We describe application to three distinct user groups: (1) a general audience of readers (2) biologists involved in metabolomics analysis (3) analysts working in a public sector regulatory environment. The resulting technical tools are implemented in the R packages ggparallel, chromatoplotsGUI, dataFormats, and CVBreports.

Our analysis shows that these tools facilitate a positive transformative effect on the quality of communication between stakeholders. Specifically we see that the common angles plot pre- sented in ggparallel reduces the lie factor, chromatoplotsGUI enables display of metabolomic data rapidly and with a level of detail that facilitates development of the underlying analysis en- gine and the methods of dataFormats and CVBreports enable significantly reduced turnaround times for preliminary data assesment.


Copyright Owner

Marie C. Vendettuoli



File Format


File Size

130 pages