Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Genetics, Development and Cell Biology


Bioinformatics and Computational Biology

First Advisor

Carolyn J. Lawrence-Dill

Second Advisor

Erik W. Vollbrecht


Maize is an important crop species and is the highest produced cereal crop in the world as well as a model species for genetics and genomics research. For this reason, researchers have been very successful in translating understanding of basic biological processes into improved crops for over 100 years. Maize researchers have a long history of utilizing genetic techniques to dissect the function of genes that control biological processes. Characterizing and cloning mutants precisely defines gene function but is a slow process that can take years to accomplish. Alternatively, computational methods provide a faster way to assign predicted function to genes by leveraging the vast knowledge base of gene function gathered by experimental and curatorial efforts in multiple species. Computational methods can be used to predict functions for genes at a genome-wide scale. Ideally, improved computational predictions would narrow and target experiments that would be used to test gene function, thus speeding the process of experimental characterization. We have created methods to improve discrete steps in both experimental characterization and computational prediction of gene function in maize. For the experimental work, we have developed molecular methods, leveraging the decreasing high-throughput sequencing cost, and bioinformatics analysis pipelines, capitalizing the availability of multiple maize genome assemblies, that improve positional cloning of maize mutants. We have also focused on methods to improve identification of T-DNA integration locations genome-wide for maize. Genes responsible for mutant phenotypes are often studied using transgenic techniques to manipulate function at a molecular level. These techniques typically integrate a transfer DNA (T-DNA) fragment into the host genome, where genome integration context may have crucial effects on transgene expression. Current methods to identify T-DNA integration locations are either cumbersome or imprecise for repetitive rich genomes like maize. We developed a molecular protocol that utilizes long-read sequencing to enrich genomic T-DNA flanks, thus revealing T-DNA placement more precisely. Working to identify and characterize genetic variants responsible for specific phenotypes gives insight into how critical the quality of predicted gene function annotations can be to inform and guide experimental investigation. Functional annotation data are used for the interpretation of results from large-scale studies such as transcriptomics and proteomics. In addition, these data are also used to inform and prioritize candidate genes potentially responsible for a phenotype for positional cloning, genetic association, and other studies. To improve the quality of predicted gene functions available for all researchers working in maize, we generated a high-coverage, high-confidence, and reproducible functional annotation dataset for maize genes using the Gene Ontology. Methods we used to generate GO annotations for maize are generic and applicable to other plants. To enable application to other species, we formalized the method used to annotate maize as a containerized pipeline called GOMAP. GOMAP has been optimized for use in high- performance computing environments and has been tested on additional maize lines and other plant species.

Copyright Owner

Wimalanathan Kokulapalan



File Format


File Size

191 pages