Degree Type


Date of Award


Degree Name

Master of Science


Genetics, Development and Cell Biology


Bioinformatics and Computational Biology

First Advisor

Erik W. Vollbrecht

Second Advisor

Volker Brendel


Transposons, with the ability to integrate into new positions in the genome, can disrupt a gene's function and thereby have been utilized as tools for genome mutagenesis. Critical to improving efficiency of such applications is to elucidate the patterns and preferences of

insertion sites selection. We here focus on understanding target site selection of transposon Ac/Ds, one of the best-characterized transposon systems in plants, by exploring various DNA features and predicting insertion sites.

A package named DnaFVP (DNA Feature Calculation, Visualization and Vector Preparation) was first developed for calculation, visualization and analysis of various DNA features, including nucleotide sequence features and a broad list of structural/physical properties. In addition, this package allows data preparation prior to calculating features and/or preparation of feature vectors for machine learning. It is developed for building a semi-automatic pipeline to explore various DNA features of any collection of genomic DNA sequences of interest and to prepare feature vectors for

further machine learning.

By use of combined nucleotide and structural features with application of the DnaFVP package, we prepared various feature vectors and predicted Ds insertion sites for machine learning. Training datasets included well-evidenced Ds insertion events (1605 events in maize and 2078 events in Arabidopsis) as positive datasets and 2000 random sampled genomic coordinates in genic regions from maize and Arabidopsis as negative datasets. An ROC (Receiver Operating Characteristic) of 0.77 in maize, 0.85 in Arabidopsis, and 0.82 in a combined dataset of maize and Arabidopsis have been achieved. One initially tested dataset in maize shows interesting results. Our prediction may provide further insight to the Ac/Ds transposition mechanism, and facilitate the ease of targeted mutagenesis and gene delivery mediated by transposons.


Copyright Owner

Xianyan Kuang



Date Available


File Format


File Size

86 pages