Scalable and reproducible genome analysis in the age of next-generation genome sequencing

Standage, Daniel

Scalable and reproducible genome analysis in the age of next-generation genome sequencing

File

Standage_iastate_0097E_15762.pdf (1.74 MB)

Supplemental Files

0-pdom_supplement.pdf (1.06 MB)

1-SB16iloci_supplement.pdf (451.45 KB)

Date

2016-01-01

Authors

Standage, Daniel

Advisor

Volker P. Brendel

Amy L. Toth

Altmetrics

Organizational Units

Organizational Unit

Genetics, Development and Cell Biology

Department

Genetics, Development and Cell Biology

Abstract

Recent advances in DNA sequencing technology and a proliferation of new algorithms for assembling, annotating, and analyzing genomes have made genome-scale sequencing more accessible than ever. As a result, the last several years have seen a dramatic increase in the number of published draft genomes. Many important research problems revolve around interpretation of these draft genomes: What are the contents of a genome? How many genes are there? Are there any conspicuous losses of genes of interest? Is the genome compact, with genes clustered very tightly, or are genes separated by large intergenic spaces? Are intergenic spaces distributed evenly throughout the genome? Which characteristics of genome composition and organization are well conserved, and which appear to be unique, warranting further investigation?

In this dissertation, I investigate this topic in multiple contexts. First, I present a draft genome of the paper wasp Polistes dominula, a model species for study of the evolution of social behavior. The genome of Polistes is similar to other social insects in many respects, but has an extremely biased nucleotide composition and shows some evidence of a reduction in genome size. Analysis of transcriptome and methylome data from queen and worker wasps reveals evidence of caste-related differences in gene expression, as well as a tremendous reduction in DNA methylation, previously thought to be an important factor in caste differentiation.

Second, I investigate questions of genome composition and organization more generally. Given a new genome assembly and annotation, what can we determine quickly about the genome’s contents? What can be said about the distribution of genes and the overall “compactness” of the genome? How should this be compared to previously published results for related species? I present a framework (and related tools) that provides precise solutions to these questions, and discuss insights gained by applying these tools to study various model organism genomes.

Copyright

Fri Jan 01 00:00:00 UTC 2016

Collections

Theses and Dissertations

Full item page