Degree Type


Date of Award


Degree Name

Master of Science


Computer Science


Computer Science

First Advisor

Baskar Ganapathysubramanian

Second Advisor

Soumik Sarkar


Machine learning has become a popular technology that has not only turbo-charged the existing problems in the AI but it has also emerged as the powerful toolkit to solve some of the interesting problems across the various interdisciplinary domains.

The availability of food is the biggest problem of the 21st century and many experts have raised their concerns as we continue to see a rise in the global human population. There have been many efforts in this direction which include but not limited to improvement in the seeds quality, good management practices, prior knowledge about the expected yield, etc.

In this work, we propose a data-driven approach that is ‘gray box’ i.e. that seamlessly utilizes expert knowledge in constructing a statistical network model for corn yield forecasting. Our multivariate gray box model is developed on Bayesian network analysis to build a Directed

Acyclic Graph (DAG) between predictors and yield. Starting from a complete graph connecting various carefully chosen variables and yield, expert knowledge is used to prune or strengthen edges connecting variables. Subsequently, the structure (connectivity and edge weights) of the DAG that maximizes the likelihood of observing the training data is identified via optimization. We curated an extensive set of historical data (1948 − 2012) for each of the 99 counties in Iowa as data to train the model. We discuss preliminary results, and specifically focus on (a) the structure of the learned network and how it corroborates with known trends, and (b) how partial information still produces reasonable predictions (predictions with gappy data), and show that incorporating the missing information improves predictions.

Copyright Owner

Vikas Chawla



File Format


File Size

33 pages