Scalable and reproducible genome analysis in the age of next-generation genome sequencing

Thumbnail Image
Date
2016-01-01
Authors
Standage, Daniel
Major Professor
Advisor
Volker P. Brendel
Amy L. Toth
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Altmetrics
Authors
Research Projects
Organizational Units
Journal Issue
Is Version Of
Versions
Series
Department
Genetics, Development and Cell Biology
Abstract

Recent advances in DNA sequencing technology and a proliferation of new algorithms for assembling, annotating, and analyzing genomes have made genome-scale sequencing more accessible than ever. As a result, the last several years have seen a dramatic increase in the number of published draft genomes. Many important research problems revolve around interpretation of these draft genomes: What are the contents of a genome? How many genes are there? Are there any conspicuous losses of genes of interest? Is the genome compact, with genes clustered very tightly, or are genes separated by large intergenic spaces? Are intergenic spaces distributed evenly throughout the genome? Which characteristics of genome composition and organization are well conserved, and which appear to be unique, warranting further investigation?

In this dissertation, I investigate this topic in multiple contexts. First, I present a draft genome of the paper wasp Polistes dominula, a model species for study of the evolution of social behavior. The genome of Polistes is similar to other social insects in many respects, but has an extremely biased nucleotide composition and shows some evidence of a reduction in genome size. Analysis of transcriptome and methylome data from queen and worker wasps reveals evidence of caste-related differences in gene expression, as well as a tremendous reduction in DNA methylation, previously thought to be an important factor in caste differentiation.

Second, I investigate questions of genome composition and organization more generally. Given a new genome assembly and annotation, what can we determine quickly about the genome’s contents? What can be said about the distribution of genes and the overall “compactness” of the genome? How should this be compared to previously published results for related species? I present a framework (and related tools) that provides precise solutions to these questions, and discuss insights gained by applying these tools to study various model organism genomes.

Comments
Description
Keywords
Citation
Source
Subject Categories
Copyright
Fri Jan 01 00:00:00 UTC 2016