Degree Type

Thesis

Date of Award

2019

Degree Name

Master of Science

Department

Computer Science

Major

Computer Science

First Advisor

Ali Janessari

Abstract

The mining industry plays an essential role in the US economy. Mining is known to be one of the most dangerous occupations. Even though there have been efforts to create a safer work environment for miners, there is still a significant number of accidents occurring on the mining sites. Mine operators are required to report all accidents, injuries, or illness that occurs at a mine to Mine Safety and Health Administration(MSHA). These reports contain several fixed fields entries as well as the narrative of the accident. In this study, we use machine learning models such as Decision Tree (DT), Random Forest (RF) and Deep Neural Network (DNN) to predict the outcome of the accident and the number of days the worker is going to be away from work (DAFW) using the MSHA dataset. These predictive models would be helpful for the safety experts in their efforts to create a safer work environment. Predicting days away from work would help the supervisor to plan for a temporary replacement. We compare the performance of all the models with the performance of traditional logistic regression model. We divide the study into two parts. In the first part, we use the structured data (fixed fields) and unstructured (injury narratives) separately to predict the injury outcome. We use the injury narratives because they provide more information about the accident than the fixed field entries. We also investigate the use of synthetic data augmentation technique using word embedding to tackle the data imbalance problem while predicting the injury outcome using the narratives. Our experiment results show that Random Forest with narratives as the input provides the best F1 score of 0.94. DNN has the least root mean squared error (0.62) while predicting DAFW using injury narratives as the input. The F1 score of all the underrepresented classes except one improved after the use of data augmentation technique. We use the DNN model to find the features which are most important in determining injury outcome and DAFW. We found that Nature of injury is the most important predictor of injury outcome.

Copyright Owner

Anurag Desai Yedla

Language

en

File Format

application/pdf

File Size

40 pages

Available for download on Friday, January 10, 2020

Share

COinS