Degree Type
Thesis
Date of Award
2016
Degree Name
Master of Science
Department
Computer Science
Major
Computer Science
First Advisor
Kathryn T. Stolee
Abstract
Though regular expressions provide a powerful search technique that is baked into every major language, is incorporated into a myriad of essential tools, and has been a fundamental aspect of Computer Science since the 1960's, no one has ever formally studied how they are used in practice, or how to apply refactoring principals to improve understandability and conformance to community standards. This thesis presents the original work of studying a sample of regexes taken from Python projects mined from GitHub, determining what features are used most often, defining some categories that illuminate common use cases, and identifying areas of significance for language and tool designers. Furthermore, this thesis defines an equivalence class model used to explore comprehension of regexes, identifying the most common and most understandable representations of semantically identical regexes, suggesting several refactorings and preferred representations. Opportunities for future work include the novel and rich field of regex refactoring, semantic search of regexes, and further fundamental research into regex usage and understandability.
DOI
https://doi.org/10.31274/etd-180810-4743
Copyright Owner
Carl Allen Chapman
Copyright Date
2016
Language
en
File Format
application/pdf
File Size
157 pages
Recommended Citation
Chapman, Carl Allen, "Usage and refactoring studies of python regular expressions" (2016). Graduate Theses and Dissertations. 15139.
https://lib.dr.iastate.edu/etd/15139