Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Electrical and Computer Engineering


Bioinformatics and Computational Biology

First Advisor

Julie A. Dickerson


The biochemical and physiological functions of a large proportion of the approximately 27,000 protein-encoding genes in the Arabidopsis genome is experimentally undetermined using sequence homology techniques alone. This thesis presents a set of bioinformatics resources including a software platform for data visualization and data analysis that address the key issues in incorporating the metabolomics data for functional genomics studies.

Multiple mass spectrometry based metabolomics platforms are combined together to get better coverage of the metabolome. Different strategies for integrating the metabolomics abundance data from multiple platforms are compared to find the ideal method for biomarker discovery. A new method of putatively identifying unknown metabolites by first order partial correlation networks is proposed that uses the existing data to incorporate structurally unknown metabolites. A comprehensive study of 70 single gene knock mutants vs. wild type samples is performed using Random Forest machine learning algorithm and a biomarker database for each of the 70 mutations is built with the key metabolites including the putative identifications of unknown metabolites.

A proof-of-concept analysis on the oxoprolinase (oxp1) and gamma-glutamyl transpeptidase (ggt1 and ggt2) single gene knock-out mutants in the glutathione degradation (GSH) pathway of the Arabidopsis confirms the known biology that OXP1 is responsible for conversion of 5-oxoproline (5-OP) to glutamic acid. In addition, ggt1/ggt2 analysis supports the hypothesis that the GGT genes may not be major contributors for the 5-OP production. Also, the lack of biochemical changes in ggt2 mutation supports the previous studies of its low level expression in leaf tissues.

The metabolomics database, the biomarker database and the data mining tools are implemented in a web based software suite at


Copyright Owner

Preeti Bais



Date Available


File Format


File Size

131 pages