A study of repetitiveness of code changes in software evolution

Thumbnail Image
Date
2013-01-01
Authors
Nguyen, Hoan
Nguyen, Anh Tuan
Nguyen, Tung Thanh
Nguyen, Tien
Rajan, Hridesh
Major Professor
Advisor
Committee Member
Journal Title
Journal ISSN
Volume Title
Publisher
Authors
Person
Rajan, Hridesh
Professor and Department Chair of Computer Science
Research Projects
Organizational Units
Organizational Unit
Journal Issue
Is Version Of
Versions
Series
Department
Computer Science
Abstract

In this paper, we present a large-scale study of repetitiveness of code changes in software evolution. We collected a large data set of 2,841 Java projects, with 1.7 billion source lines of code (SLOC) at the latest revisions, 1.8 million code change revisions (0.4 million fixes), 6.2 million changed files, and 2.5 billion changed SLOCs. A change is considered repeated within or cross-project if it matches another change having occurred in the history of the project or another project, respectively. We report the following important findings. First, repetitiveness of changes could be as high as 70-100% at small sizes and decreases exponentially as size increases. Second, repetitiveness is higher and more stable in the cross-project setting than in the within-project one. Third, fixing changes repeat similarly to general changes. Importantly, learning code changes and recommending them in software evolution is beneficial with accuracy for top-1 recommendation of over 30% and top-3 of nearly 35%. Repeated fixing changes could also be useful for automatic program repair.

Comments

This article is published as Nguyen, Hoan Anh, Anh Tuan Nguyen, Tung Thanh Nguyen, Tien N. Nguyen, and Hridesh Rajan. "A study of repetitiveness of code changes in software evolution." In Proceedings of the 28th IEEE/ACM International Conference on Automated Software Engineering, pp. 180-190. IEEE Press, 2013. 10.1109/ASE.2013.6693078 . Posted with permission.

Description
Keywords
Citation
DOI
Copyright
Tue Jan 01 00:00:00 UTC 2013