Usage and refactoring studies of python regular expressions

Chapman, Carl

Usage and refactoring studies of python regular expressions

File

Chapman_iastate_0097M_15725.pdf (1.37 MB)

Date

2016-01-01

Authors

Chapman, Carl

Advisor

Kathryn T. Stolee

Altmetrics

Organizational Units

Organizational Unit

Computer Science

Computer Science—the theory, representation, processing, communication and use of information—is fundamentally transforming every aspect of human endeavor. The Department of Computer Science at Iowa State University advances computational and information sciences through; 1. educational and research programs within and beyond the university; 2. active engagement to help define national and international research, and 3. educational agendas, and sustained commitment to graduating leaders for academia, industry and government.

History
The Computer Science Department was officially established in 1969, with Robert Stewart serving as the founding Department Chair. Faculty were composed of joint appointments with Mathematics, Statistics, and Electrical Engineering. In 1969, the building which now houses the Computer Science department, then simply called the Computer Science building, was completed. Later it was named Atanasoff Hall. Throughout the 1980s to present, the department expanded and developed its teaching and research agendas to cover many areas of computing.

Dates of Existence
1969-present

Related Units

College of Liberal Arts and Sciences (parent college)

Department

Computer Science

Abstract

Though regular expressions provide a powerful search technique that is baked into every major language, is incorporated into a myriad of essential tools, and has been a fundamental aspect of Computer Science since the 1960's, no one has ever formally studied how they are used in practice, or how to apply refactoring principals to improve understandability and conformance to community standards. This thesis presents the original work of studying a sample of regexes taken from Python projects mined from GitHub, determining what features are used most often, defining some categories that illuminate common use cases, and identifying areas of significance for language and tool designers. Furthermore, this thesis defines an equivalence class model used to explore comprehension of regexes, identifying the most common and most understandable representations of semantically identical regexes, suggesting several refactorings and preferred representations. Opportunities for future work include the novel and rich field of regex refactoring, semantic search of regexes, and further fundamental research into regex usage and understandability.

Copyright

Fri Jan 01 00:00:00 UTC 2016

Collections

Theses and Dissertations

Full item page