Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Civil, Construction, and Environmental Engineering


Civil Engineering

First Advisor

In-Ho Cho


Over the past few decades, in most science and engineering fields, data-driven research has been becoming a promising next-generation research paradigm due to noticeable advances in computing power and accumulation of valuable databases. Despite this valuable accomplishment, the leveraging of these databases is still in its infancy. To address this issue, this dissertation investigates the following studies that use advanced statistical methods.

The first study aims to develop a computational framework for collecting and transforming data obtained from heterogeneous databases in the Federal Aviation Administration and build a flexible predictive model using a generalized additive model (GAM) to predict runway incursions for 15 years in the top major US 36 airports. Results show that GAM is a powerful method for RI prediction with a high prediction accuracy. A direct search for finding the best predictor variables appears to be superior over the variable section approach based on a principal component analysis. The prediction power of GAM turns out to be comparable to that of an artificial neural network (ANN).

The second study is to build an accurate predictive model based on earthquake engineering databases. As with the previous study, GAM is adopted as a predictive model. The result shows a promising predictive power of GAM with application to existing reinforced concrete shear wall databases.

The primary objective of the third study is to suggest an efficient predictor variable selection method and provide relative importance among predictor variables using field survey pavement and simulated airport pavement data. Results show that the direct search method always finds the best predictor model, but the method takes a long time depending on the size of data and the variables' dimensions. The results also depict that all variables are not necessary for the best prediction and identify the relative importance of variables selected for the GAM model.

The fourth study deals with the impact of fractional hot-deck imputation (FHDI) on statistical and machine learning and prediction using practical engineering databases. Multiple response rates and internal parameters (i.e., category number and donor number) are investigated regarding the behavior and impacts of FHDI on prediction models. GAM, ANN, support vector machine, and extremely randomized trees are adopted as predictive models. Results show that the FHDI holds a positive impact on the prediction for engineering-based databases. The optimal internal parameters are also suggested to achieve a better prediction accuracy.

The last study aims to offer a systematic computational framework including data collection, transformation, and squashing to develop a prediction model for the structural behavior of the target bridge. Missing values in the bridge data are cured by using the FHDI method to avoid an inaccurate data analysis due to biasness and sparseness of data. Results show that the application of FHDI improves prediction performances.

This dissertation is expected to provide a notable computational framework for data processing, suggest a seamless data curing method, and offer an advanced statistical predictive model based on multiple projects. This novel research approach will help researchers to investigate their databases with a better understanding and build a statistical model with high accuracy according to their knowledge about the data.

Copyright Owner

Ikkyun Song



File Format


File Size

166 pages