Degree Type


Date of Award


Degree Name

Doctor of Philosophy


Civil, Construction, and Environmental Engineering

First Advisor

Amr Kandil


Construction is one of the industries with a major contribution to the nation's economy. It is estimated that the world construction market has reached US $5.5 trillion at the end of 2007 (Harmon 2003). In the U.S., the construction industry employs 7.5 million full and part time employees and contributes to nearly $1.2 trillion to its economy making it the largest single production sector (El-adaway 2008). With that magnitude, it is not only considered as the backbone of the nations' economy, but also a significant indicator of its advancement, efficiency, and success. However, due to the dynamic nature of the construction industry and the increasing sophistication and complexity of construction projects, its contribution is negatively affected by the increasing number of disputes. Unfortunately, the rate and frequency of conflicts has risen with the growing complexity of projects. Modern construction projects require increasingly sophisticated construction methods and extensive interaction of diversified parties, thus enhancing the likelihood of conflicts and disputes.

Construction disputes are ultimately resolved in courts unless a private construction contract calls for other resolution mechanisms. In fact, some in the construction industry prefer litigation; however, their preference comes at great cost. Despite the numerous advantages of litigation, which includes being the most formal and binding process, it has two main shortcomings, which make the process undesirable and unsupportive of the growth and development of the construction industry. First, depending on the jurisdiction, complex construction disputes may take anywhere from two to six years before they reach trials. Second, the prolonged, detailed, factual discovery process makes litigation very expensive due to the need for specialized personnel with extensive legal knowledge and construction experience, a combined skill set that is not widely available in the industry. In order to overcome these major drawbacks that impact the construction industry's advancement and contribution to the nations' economy, legal decision support systems are needed to effectively and efficiently mitigate these shortcomings and in turn allow for better control and management of construction projects.

In construction disputes the initiation of the conflict can be attributed to a number of reasons including: change orders, escalation, and differing site conditions, etc. Each of these reasons leads to a separate method for addressing and handling the disputes and accordingly, each reason can be considered as a different dispute type. Among these types, one of the most important and frequently occurring disputes is Differing Site Conditions (DSC) which results from contractors encountering conditions materially different from those expected or described by the owner. This warrants special attention to this kind of dispute due to their potential for deviating construction projects from their planned time and cost.

A number of researchers in Artificial Intelligence (AI) fields have developed tools and methodologies for modeling judicial reasoning and predicting the outcomes of construction litigation cases in an attempt to provide the above mentioned decision support capabilities. Despite the significant contributions of these systems to the advancement of legal decision support capabilities in construction, their success was limited because they were not based on a detailed analysis of legal concepts that govern litigation outcomes.

Consequently, the objective of this dissertation is to provide a coherent and integrated methodology for construction legal decision support for Differing Site Conditions (DSC) disputes through statistical modeling and machine learning. To attain this goal, the current study designed and implemented a 4 step methodology targeting the following goals: (1) to extract a comprehensive set of legal factors that govern DSC litigation outcomes in the construction industry; (2) to devise a litigation prediction model for DSC disputes in the construction industry based on the extracted set of legal factors; (3) to create a methodology for automated extraction of significant legal factors that governs DSC litigation outcomes from case documents; and (4) to develop an automated retrieval model for identifying DSC precedent cases from a large corpus based on similarity to newly introduced ones. The 4 steps of this methodology were implemented incrementally, and each step relied on the outcome of its predecessor.

First, a comprehensive set of significant legal factors that govern DSC litigation cases verdicts were extracted through statistical modeling. Binary Probit and Logit Choice Models were developed (a) to identify the effect of each extracted factor on the prediction of the winning party; (b) to identify the best combination of factors with the highest significance on the prediction model; and (c) to perform a sensitivity analysis to prioritize the most significant legal factors. Among the main findings of this step are (1) in general, cases in which the Federal Government is a party of the dispute, judgments are in favor of the government (owner) over contractor; (2) "the presence of evident facts that the encountered conditions caused a change in the nature and cost of the contract" had the highest impact among variables causing a decrease in the prediction of judgment in favor of the owner, and causing an increase of 17.77% in prediction on favor of the contractor; (3) "the presence of evident facts that the specifications included a warning against the presence of DSC from those conveyed in the contract documents" caused the highest increase in the prediction of judgment in favor of the owner amounting to an increase of 56.56%; and (4) the development of Binary Probit and Logit Choice Models extracted a joint set of 13 statistically significant legal factors related to DSC disputes in the construction industry. This set provided the grounds for the other three steps of the current research methodology.

Second, an automated litigation prediction model for DSC disputes in the construction industry through machine learning was developed based on the identified factors in the first step. The framework under this step incorporates analysis of different machine learning methodologies including support vector machines (SVM), Nayve Bayes (NB), and rule induction classifiers like Decision Trees (DT), Boosted Decision Trees (AD Tree), and PART. Ten machine learning models were developed using these machine learning methodologies to evaluate the best methodology for predicting litigation outcomes. The analysis of all developed models showed that the SVM Kernel Polynomial 3rd degree model has the best performance. This model attained an overall prediction accuracy of 98%.

Third, an automated significant legal factors extraction model for DSC disputes in the construction industry through machine learning was developed. The framework under this step (1) developed 24 machine learning models in which 4 weighting schemes namely Term Frequency (tf), Logarithmic Term Frequency (ltf), Augmented Term Frequency (atf), and Term Frequency Inverse Document Frequency (tf.idf) were implemented for each type of classifier; and (2) developed two C++ algorithms for the preparation of the corpus and implementation of the required weighting mechanisms. The highest prediction rate of 84% was attained by NB classifier while implementing tf.idf weighting. The model was further validated by testing newly un-encountered cases, and a prediction precision of 81.8% was attained.

Finally, the fourth step of the methodology developed an automated machine learning model for the retrieval of supporting DSC precedent cases from large corpi. This step, therefore, (1) implemented Latent Semantic Analysis algorithm; and (2) developed 9 reduced feature spaces with feature sizes of 5, 10, 15, 20, 100, 200, 300, 400, and 500 for analysis and validation of the implemented algorithm. Among the findings of this step are (1) low dimension reduced feature spaces are more representative of documents closely related to the domain problem; (2) high dimension reduced feature spaces, are more representative to domain problems modeling dispersed and unrelated document collections; and (3) LSA reduced feature space of 10 features is the best reduced feature space to adopt for automating the extraction of similar DSC cases from a large corpus.

The main research developments of this research contribute to the advancement of the current state of the art in construction legal decision support and Knowledge Management (KM) in the construction legal domain by developing much needed systems for (1) litigation outcomes prediction; (2) automated legal factor extraction; and (3) automated precedent case retrieval. Those developments hold promises to decrease the costs of legal experts in the construction industry by decreasing time spent on non-value adding tasks such as documents analysis, and offering initial estimates of the legal situation of a disputing party; (2) decrease the time consumed in the litigation processes; (3) facilitate access to legal knowledge needed by practitioners in the construction industry; (4) provide a better understanding of the legal consequences of decision making in the construction industry; and (5) provide solid support documents and probabilistic measures about the strength of a legal situation of a disputing party for better decision making about resolution mechanisms. All these expected outcomes have promising potential to decrease the negative impact of disputes on the construction industry, and thereby creating significant opportunities for the growth of this important sector of the US economy.

Copyright Owner

Tarek Said Mahfouz



Date Available


File Format


File Size

351 pages