Date of Award
Doctor of Philosophy
The analysis of data can be conceptualized as a process of sequential steps or actions applied to data in order to achieve a quantitative result. An important aspect of the process is how to ensure that it is reproducible. Reproducibility as it applies to Statistics research involves both statistical reproducibility and computational reproducibility. Achieving reproducibility is not trivial, particularly if the problem is complex or involves data from non-standard sources. Automated bullet evidence comparison as proposed by Hare et al. (2017) involves both a complex data analysis as well as a non-standard form of data. Here, it serves as a large-scale motivating example, to help us study the impact of decision-making on the statistical and computational reproducibility of a quantitative result. We first present a method for data pre-processing and assess its impact on bullet land engraved area (LEA) matching accuracy. This is followed by a large user variability study of the high-resolution bullet LEA scanning process and development of an extended Gauge Repeatability and Reproducibility framework. Finally, we propose a framework for adaptive computational reproducibility in a changing landscape of R packages and present software tools to facilitate the study and management of computational reproducibility in R.
Rice, Kiegan, "A framework for statistical and computational reproducibility in large-scale data analysis projects with a focus on automated forensic bullet evidence comparison" (2020). Graduate Theses and Dissertations. 18207.