Degree Type

Dissertation

Date of Award

2018

Degree Name

Doctor of Philosophy

Department

Statistics

Major

Bioinformatics and Computational Biology

First Advisor

Dianne Cook

Second Advisor

Amy L. Toth

Abstract

As is the case in many fields, biological disciplines are now facing the challenges of increasingly large and complex data. Biologists must now process and meaningfully interpret a deluge of data, and one necessary approach toward accomplishing this goal is through the use of visualization. Ultimately, the objective of developing visualization tools for biological data is to provide biologists with enhanced insight into the processes within organelles, cells, organs, and even whole organisms. R is a free interpretive programming language for statistical computing and graphics. It is widely used by statisticians to develop statistical software and data analysis tools, and has become even more popular in recent years for researchers across a wide range of disciplines.

In this dissertation, we focus primarily on developing effective visualization tools for genealogical and RNA-sequencing datasets within the R framework. This work addresses the lack of modern and interactive visualization techniques in the fields of genealogy and RNA-sequencing through the following specific aims: (i) develop improved visualization techniques for genealogical datasets; (ii) generate comprehensive collections of examples underlining the importance of visualizing RNA-sequencing datasets; (iii) develop improved visualization methods for RNA-sequencing datasets; and (iv) perform an RNA-sequencing experiment that examines virus inoculation and nutrition in honey bees while applying the visualization tools we previously validated and developed.

First, we present our software package ggenealogy that includes new visualization tools for genealogical datasets. In particular, we introduce a new method that provides unequivocal information about lineages in situations where intergenerational breeding occurs, as is often the case in agronomic applications. This was not previously possible with standard pedigree charts. Second, we create a compilation of reproducible examples using numerous public RNA-sequencing datasets that demonstrates uncommon visualization techniques detecting normalization issues, differential expression designation problems, and common analysis errors. We also show that these visualization tools can identify genes of interest in ways undetectable with models. Third, we introduce our software package bigPint that comprises visualization tools for RNA-sequencing datasets, many of which we previously showed to be beneficial through extensive testing. Fourth, we conduct the first RNA-sequencing study that examines the combined effects of monofloral diets and Israeli Acute Paralysis Virus (IAPV) inoculation on gene expression patterns in honey bees. These factors have been implicated as environmental stressors that pose heightened dangers to honey bee health, the decline of which has major implications for agricultural sustainability. Importantly, we use an extensive data visualization approach in our RNA-sequencing study that incorporates the methods we developed earlier and recommend such an avenue for researchers who have noisy RNA-sequencing data in the future.

Copyright Owner

Lindsay Rutter

Language

en

File Format

application/pdf

File Size

211 pages

Share

COinS