Date of Award
Doctor of Philosophy
Research suggests that in North America, soybean Relative Maturity (RM) is controlled by a minimum of eight genetic loci labeled E loci. The amount of variation explained by these genes would suggest that accurate predictions for RM could be obtained using prediction models that only include allele effects for markers located near the major E genes. Having the ability to accurately predict the RM of a segregating breeding line using genetic information has the potential to positively impact both the rate of genetic gain and cost per unit of genetic gain within a breeding program by enabling; 1) prediction of RM in segregating progeny from crosses between parents with large differences in RM; 2) selection of segregating lines with appropriate RMs in non-adapted off season nurseries; 3) increased selection intensities of segregating lines assigned to field trials; and 4) cost reduction of replicated field trials. The objectives of this research then was to; 1) compare the accuracy of RM prediction using genome wide markers versus using prediction models containing only molecular markers significantly associated with RM; 2) validate that prediction accuracies were maintained when predictions were made for segregating lines not only having distant relationships to those in the original training dataset, but also developed and grown outside of the years of the segregating lines in the original training dataset; and 3) evaluate if the prediction accuracies and associated genotyping costs support wide scale RM prediction within a soybean cultivar development program.
In effort to determine if the RM of a segregating soybean breeding line could be predicted using genetic information, we developed a training dataset that consisted of 1,244 F4 derived advanced stage segregating soybean lines having known RMs ranging from RM 1.3 to 8.0 that were genotyped with 1,817 genome wide single nucleotide polymorphism (SNP) markers. The segregating lines were selected from multiple families that were the result of hundreds of breeding crosses made over multiple years in a soybean cultivar development program. The data were utilized to determine allele effects for four prediction models, two models that represented traditional Marker Assisted Selection (MAS) approaches using only markers associated with known E genes or within regions of the genome thought to influence RM (specific E-gene and expanded E-gene) and two Genomic Prediction (GP) models with distinct marker densities (full GP model and reduced GP model). The GP and expanded E-gene prediction models evaluated in the study produced an average across RM prediction accuracy from 0.93 to 0.94 while the E-gene specific model prediction accuracy was 0.81. The results indicated that the E genes identified in the literature were highly predictive of RM, the greatest prediction accuracies however were obtained through the use of whole genome marker panels. While the results from the initial research were promising, additional research was required to determine if the prediction accuracies could be maintained when predictions were made on segregating lines outside of the years of those contained within the original training dataset.
In an attempt to strengthen the prediction accuracies obtained for the early and late maturities, the original training dataset was expanded to include a total of 2,194 segregating lines that were selected from replicated field trials in 2009-2013 having validated RM phenotypes that ranged from RM 0.0 to 8.0. All of the 2,194 segregating lines within the updated training dataset had previously been genotyped using 1,118 genome wide SNP markers. Since it was identified in the preliminary research that prediction accuracies were highest when whole genome marker panels were used, only a full GP model using allele effect estimates for all 1,118 SNP markers was evaluated in this study. The 1,118 SNP marker GP model successfully predicted the RM’s of 1,854 segregating lines in 2014 and 1,465 segregating lines in 2015. The estimated correlation between predicted RM (RMp) and validated RM (RMv) for all segregating lines was 0.95 with an average difference between RMp and RMv of 4 days. Prediction accuracies were again the lowest for segregating lines with RMv earlier than 1.0 and later than 5.0 which we feel was still likely the result of a small number of segregating lines in the training set for those RM groups. Alternative metrics including the frequency of RMp within 0.5 of RMv, f(|RMp-RMv|≤ 0.5) and the frequency of RMp within 0.25 of RMv, f(|RMp-RMv|≤0.25) were developed that indicated that across years, 66% of the segregating lines had RMp that were within 5 days of their RMv and 39% of the segregating lines had RMp that were within 2.5 days or their RMv. The f(|RMp-RMv|≤ 0.5) and f(|RMp-RMv|≤0.25) improved to 73% and 46% respectively when only segregating lines with RMv that ranged from 1.0 – 5.9 were evaluated. While the results from this second round of research proved that genetic information could be used to predict the RM of segregating lines with relatively high accuracy across the maturity groups grown within NA, additional analysis was required to determine if wide scale implementation could be justified within a breeding program.
In effort to determine if the prediction accuracies and genotyping costs associated with predicting the RM of segregating lines using genetic information could be justified for wide scale implementation within a breeding program, we evaluated the program wide implementation of RM prediction using basic principles of Operations Research (OR). A simple Microsoft Excel based tool termed the Genomic Prediction Evaluation Tool (GPE tool) was built that allowed all possible cultivar development scenarios that exist within the Iowa State University soybean breeding program to be evaluated to determine both Total Program Cost (TPC) and Relative Breeding Design Efficiency (RBDE). Optimal breeding designs were those designs that both maximized RBDE while minimizing TPC. Two analysis were conducted using the GPE tool. The first analysis (analysis 1) determined the total number of years to reach the final year of replicated field trials as the number of years from the initiation of crossing to the final year of field trials. The second analysis (analysis 2) added a year to the total number of years from crossing to the final year of field trials for those designs that utilized a North American summer crossing block, thus decreasing associated RBDE. Of the optimal breeding designs identified from both analysis, no design was identified that recommended the use of RM prediction to support the cultivar development process, the associated cost of implementation was simply too high. Slight modifications to the current version of the GPE tool should allow the ISU breeding program to identify more efficient breeding designs as compared to the current design that has been implemented to date. The GPE in its current version sets the foundation to build a tool that will provide soybean breeders the ability to appropriately evaluate the potential wide scale implementation of GS to predict complex phenotypes in support of soybean variety development.
Tracy William Doubler
Doubler, Tracy William, "The use of genetic information to predict the relative maturity of soybeans" (2016). Graduate Theses and Dissertations. 15903.