Date of Award
Doctor of Philosophy
Genetics, Development and Cell Biology
Bioinformatics and Computational Biology
Drena L. Dobbs
The era of “big data” has led to the generation of more biological data than any human could hope to process. This flood of data has necessitated the development of computational methods to assist in analysis, and has made it possible to begin to model complex biological systems. Machine learning methods represent one avenue for modeling, and allow for the identification of intricate and often cryptic sequence signals underlying many biological processes.
In this dissertation, I present two machine learning models, RPIDisorder and MEDJED, which were developed to predict RNA-protein interaction partners (RPIPs) and DNA double-strand break (DSB) repair by the microhomology-mediated end joining (MMEJ) pathway, respectively. I also present the Gene Sculpt Suite, a set of freely available web-based software tools for precision gene editing.
RPIDisorder uses signals from protein and RNA sequences (some of which have been previously utilized in published RNA-protein partner prediction methods), but it additionally exploits signal from disordered protein regions to predict interactions with greater specificity than has been possible before. RPIDisorder allows for the prediction of biologically relevant RNA-protein interaction networks, which in turn can assist in the development of clinical interventions for the numerous cancers and neurological and metabolic disorders associated with disruptions in RNA-protein interactions. RPIDisorder is freely available at www.rpidisorder.org.
MEDJED (Microhomology-Evoked Deletion Judication EluciDation) uses signal within and surrounding short stretches of homologous DNA sequence (microhomologies) on either side of an introduced DSB to predict the extent to which a targeted genomic site will be repaired using the MMEJ pathway. MEDJED is freely available at www.genesculpt.org/medjed/.
The advent of gene editing nucleases including CRISPR/Cas systems, TALENs, and zinc finger nucleases has made it possible to insert, delete, and precisely edit DNA. A great deal of recent research has focused on improving the efficiency and precision of these nucleases by leveraging endogenous DSB repair pathways including non-homologous end joining (NHEJ) and homologous recombination (HR). However, homology-mediated end joining pathways (HMEJ), including MMEJ and single-strand annealing (SSA), provide many advantages over NHEJ and HR. The Gene Sculpt Suite is a set of three web-based tools (GTagHD, MEDJED, and MENTHU) that leverage HMEJ pathways to enhance exogenous DNA knock-in (GTagHD) and produce more efficient and precise gene knock-outs (MEDJED and MENTHU). The Gene Sculpt Suite is freely available at www.genesculpt.org.
Taken together, the results of these studies demonstrate that machine learning models can be valuable for identifying sequence signals that regulate macromolecular recognition, with numerous potential applications in both basic and applied research.
Carla M. Mann
Mann, Carla M., "Applications of machine learning to solve biological puzzles" (2019). Graduate Theses and Dissertations. 17508.