Date of Award
Doctor of Philosophy
Bioinformatics and Computational Biology
Dianne H. Cook
When identifying best practices for multistep processes involving data analysis, it is fre- quently the case that the data scientist is asked to wear many hats simultaneouly: developer, programmer, statistician, graphic designer, writer, administrator. Although many scientists address these roles with great success, it is often at the expense of reproducibility, scalability, and organizational knowledge. The process of formalizing each step of the process creates op- portunity to apply lessons learned and proven tools from multiple disciplines to optimize each step of the transformation from raw data to usable output. This modular approach allows organizations to mix off the shelf technical solutions with custom, swap out components for flexibility and minimize rework.
The primary focus of this dissertation is to extend the conceptualization of pipeline to include methods drawn from human computer interaction, exploratory data analysis, interactive graphics, and reproducible research. We describe application to three distinct user groups: (1) a general audience of readers (2) biologists involved in metabolomics analysis (3) analysts working in a public sector regulatory environment. The resulting technical tools are implemented in the R packages ggparallel, chromatoplotsGUI, dataFormats, and CVBreports.
Our analysis shows that these tools facilitate a positive transformative effect on the quality of communication between stakeholders. Specifically we see that the common angles plot pre- sented in ggparallel reduces the lie factor, chromatoplotsGUI enables display of metabolomic data rapidly and with a level of detail that facilitates development of the underlying analysis en- gine and the methods of dataFormats and CVBreports enable significantly reduced turnaround times for preliminary data assesment.
Marie C. Vendettuoli
Vendettuoli, Marie C., "Workflow tools for biological applications" (2013). Graduate Theses and Dissertations. 13302.