Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Computer Science


Bioinformatics and Computational Biology

First Advisor

Guang Song

Second Advisor

Robert Jernigan


The important structural and functional roles played by proteins in the proper functioning of cellular processes cannot be overstated. To comprehensively understand their functional behaviors, structural models derived from experimental data have been developed and these models have played a significant role in explaining the functional mechanisms of proteins. The paradigm "structure drives function" had been active for many years until recent evidence suggested that the complex functions of proteins could not be fully explained by a single structure and dynamics played a very important role in deciphering their functions. To incorporate dynamics into structural representations, ensembles of conformations, instead of a single structure, are used frequently in recent literature and are found to be successful in explaining the functions of many proteins. The work described in this thesis focuses on methods used to construct such ensemble representations of proteins. A careful investigation of the issues and challenges in obtaining such ensembles is undertaken.

In the first part of the thesis, we focus on representing the native state of a given protein using a weighted ensemble representation, where relative populations (or Boltzmann weights) are assigned for individual members of the ensemble. This representation has the advantage of representing the dynamics using only a few conformational states, thereby minimizing the potential of over-fitting, while capturing the dynamics of the protein that a single average structure misses. Using Ubiquitin as an example, we show that determination of such a weighted ensemble representation is feasible when using RDCs as constraints. Moreover, the conformational states of the weighted ensemble are biologically relevant to the functional behaviors of the protein. We then compare the quality of the weighted ensemble representation with other representations available for Ubiquitin and show that the weighted ensemble representation can successfully reproduce a series of experimental data (RDCs, Residual Chemical Shift Anisotropies, Amide Exchange reactivities and solution scattering profiles) equally well or even better than other representations and without over-fitting. We then extend this work and determine a weighted ensemble representation for Hen Egg White Lysozyme (HEWL). To establish the quality of this ensemble, we perform a series of rigorous cross-validation of this ensemble against extensive amount of experimental data available for HEWL. Lastly, we perform a series of NMR structure refinements under synthetic and controlled conditions to evaluate the structural quality of obtained solutions by various refinement protocols. Our results indicate that ensemble refinement protocols without using weights and good initial conformations may not result in better descriptions of protein native states even though they appear to fit experimental data better and even pass cross-validation tes


Copyright Owner

Vijay Vammi



File Format


File Size

172 pages