Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Ecology, Evolution, and Organismal Biology


Bioinformatics and Computational Biology

First Advisor

Jonathan F. Wendel

Second Advisor

Dennis V. Lavrov


Polyploidy is an important process in plant evolution, most often involving the merger of two divergent nuclear genomes (allopolyploidy). Allopolyploidy typically is followed by genomic, genetic and epigenetic responses. Because usually only the maternal set of plastid and mitochondrial genomes are inherited in allopolyploid species, the question arises as to how plants deal with the resulting alterations to cytonuclear stoichiometry. A second understudied dimension of allopolyploid concerns small RNAs and the pace and pattern of their divergence among diploid species and following polyploidy. In this thesis I present analyses addressing these fundamental questions about allopolyploid evolution, using as models both cotton and other exemplar allopolyploid lineages.

In the work presented here I investigated cytonuclear coordination of the key chloroplast protein rubisco (ribulose 1,5-bisphosphate carboxylase/oxygenase), which is composed of nuclear-encoded, small subunits (SSUs encoded by rbcS gene) and plastid-encoded, large subunits (LSUs encoded by rbcL gene). Our initial analyses used Gossypium (cotton) as the model. The composition of rbcS gene orthologs and homoeologs in representative parental diploids and polyploids were characterized. Sequence alignment and comparison among rbcS homoeologs revealed cytonuclear coevolution at the genomic level, which is mediated by NRHR (Non-Reciprocal Homoeologous Recombination) between homoeologs in all natural polyploid species. The inter-genomic gene conversion events consistently converted all paternal D-genome homoeologous SSUs into A-genome like subunits at regions where the SSU interacts with the LSU. Homoeologous rbcS gene expression in leaves of diploid hybrid and polyploid cotton revealed evidence of cytonuclear accommodation at the transcriptional level, namely, preferential expression of maternal rbcS homoeologs. Motivated by findings in Gossypium, I extended this model to include additional polyploid lineages, i.e., Arabidopsis, Arachis, Brassica, and Nicotiana. Phylogenetic analyses demonstrated concerted evolution of rbcS genes in all allopolyploids. By comparing rbcS homoeolog sequences in allopolyploids with their corresponding orthologs in representative parental diploids, we demonstrated a consistent pattern of post-polyploidy gene conversion among rbcS homoeologs, similar to findings in Gossypium. In addition, biased homoeolog expression of paternal homoeologs carrying maternal conversions were also confirmed in most polyploid species. These results demonstrate that inter-genomic gene conversion at the genomic level, and preferential expression of maternal or maternal-like nuclear genes at the transcriptional level, may be common cytonuclear adjustments to genome merger employed by allopolyploids.

To investigate the role of small RNAs (microRNAs) and their participation in the regulation of gene expression I examined two closely related diploid cotton species, G. arboreum (A2) and G. raimondii (D5). Analysis using a custom miRNA gene prediction pipeline revealed 33 conserved candidate miRNA gene families shared between the two species. Identified miRNA families had similar copy number and average evolutionary rates across the diploid species. Comparing the presence/absence of these miRNA gene families in other land plant species revealed lineage-specific losses and gains. A striking interspecific asymmetry in expression, which is potentially connected to relative adjacency with neighboring transposable elements, was detected between species. The complex correlation pattern of miRNAs and their targeted genes implicates potential functional divergence of conserved miRNA families even within the same plant genus.

Novel miRNA genes that were not identified in other land plant species were characterized in both G. arboreum (A2) and G. raimondii (D5). Many of the miRNA families were shared between the two diploids, although the genome of A2 contained 3.5 fold more species-specific miRNA families when compared with D5. This observation is potentially explained by the higher rate of inverted duplication among protein-coding genes in A2, as well recent duplication and divergence of pre-existing miRNA genes. Together with previous findings these data demonstrate a relatively conserved evolutionary pattern for certain families of ancient derived miRNA; however the rapid and divergent genesis of novel miRNA genes accompanying speciation in Gossypium demonstrate the evolutionary dynamics of miRNA gene families can be highly variable within the same genus.

In G. raimondii, analysis of siRNA populations revealed the sub-telomeric region at the 3' end of cotton chromosome 1 to be enriched for siRNA localization. Furthermore, this genomic region contained a preponderance of relatively new transposable element (TE) insertions. The recent origin of these TEs was implicated by [1] their less sequence divergence and [2] a negative correlation pattern between the abundance of uniquely mapped siRNAs and those that map multiple regions of the genome. Active transcription of the compositional TEs and their positive correlation with expressed siRNAs indicates sufficient, but well-controlled transcription of young TEs may be necessary to maintain the silencing of such TEs.

Copyright Owner

Lei Gong



File Format


File Size

229 pages