Workflow tools for biological applications

Thumbnail Image
Date
2013-01-01
Authors
Vendettuoli, Marie
Major Professor
Advisor
Heike Hofmann
Dianne H. Cook
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Organizational Unit
Statistics
As leaders in statistical research, collaboration, and education, the Department of Statistics at Iowa State University offers students an education like no other. We are committed to our mission of developing and applying statistical methods, and proud of our award-winning students and faculty.
Journal Issue
Is Version Of
Versions
Series
Department
Statistics
Abstract

When identifying best practices for multistep processes involving data analysis, it is fre- quently the case that the data scientist is asked to wear many hats simultaneouly: developer, programmer, statistician, graphic designer, writer, administrator. Although many scientists address these roles with great success, it is often at the expense of reproducibility, scalability, and organizational knowledge. The process of formalizing each step of the process creates op- portunity to apply lessons learned and proven tools from multiple disciplines to optimize each step of the transformation from raw data to usable output. This modular approach allows organizations to mix off the shelf technical solutions with custom, swap out components for flexibility and minimize rework.

The primary focus of this dissertation is to extend the conceptualization of pipeline to include methods drawn from human computer interaction, exploratory data analysis, interactive graphics, and reproducible research. We describe application to three distinct user groups: (1) a general audience of readers (2) biologists involved in metabolomics analysis (3) analysts working in a public sector regulatory environment. The resulting technical tools are implemented in the R packages ggparallel, chromatoplotsGUI, dataFormats, and CVBreports.

Our analysis shows that these tools facilitate a positive transformative effect on the quality of communication between stakeholders. Specifically we see that the common angles plot pre- sented in ggparallel reduces the lie factor, chromatoplotsGUI enables display of metabolomic data rapidly and with a level of detail that facilitates development of the underlying analysis en- gine and the methods of dataFormats and CVBreports enable significantly reduced turnaround times for preliminary data assesment.

Comments
Description
Keywords
Citation
Source
Copyright
Tue Jan 01 00:00:00 UTC 2013