Degree Type


Date of Award


Degree Name

Master of Science


Computer Science

First Advisor

David M. Weiss


Most software systems undergo continuous change in different phases of their lifecycle such as development or maintenance. Ideally, such changes should correspond to a system's modular design. However, some changes span across more than one component thereby resulting in discrepancies between design and implementation. In such cases, making a change to one component requires changes to other components leading to an increase in time and effort to make changes to a software system as it evolves.

This thesis investigates: 1) an approach to observe how components change together by identifying tightly coupled changes known as chunks, 2) whether there are any trends in how chunks evolve over time, and 3) whether chunks can help identify design issues in a software system.

In this work, a family of algorithms is proposed to identify independently changing chunks from change data obtained from mining version history repositories of three large software systems - Moodle, Eclipse, and Company-X. A comprehensive analysis of certain characteristics of the resulting chunks is conducted. In addition, evolution of chunks with respect to size in terms of number of files within a chunk, and percentage of changes crossing a chunk are studied. Lastly, a pragmatic interpretation of the results to identify necessary code refactoring or system redesign is presented.

The findings of this work show that the percentage correlation of a chunk decreases with an increase in the number of inter-component or subsystem couplings. We also observed that there is no association between chunk size and percentage correlation. Identifying chunks that merge helps in a better understanding of the inconsistencies between how a system is designed for change and how it is actually changed, and to identify areas of a system that require refactoring or redesign. Additionally, identifying stable chunks can provide insights into how size and percentage correlation of the corresponding empirical components change over time.

Copyright Owner




File Format


File Size

106 pages