A methodology for sorting haploid and diploid corn seed using terahertz time domain spectroscopy and machine learning

Jared Taylor, Iowa State University
Chien-Ping Chou, Iowa State University
Leonard J. Bond, Iowa State University

This proceeding may be downloaded for personal use only. Any other use requires prior permission of the author and AIP Publishing. This article appeared in Taylor, Jared, Chien-Ping Chiou, and Leonard J. Bond. "A methodology for sorting haploid and diploid corn seed using terahertz time domain spectroscopy and machine learning." AIP Conference Proceedings 2102, no. 1 (2019): 080001, and may be found at DOI: 10.1063/1.5099809. Posted with permission.


The ability of terahertz (THz) electromagnetic waves to penetrate a wide range of materials gives potential for diverse applications in nondestructive evaluation, biomed, and agriculture and there has been rapid expanding both in its use. One possible application is in relation to corn breeding, specifically when the doubled haploid method is used as a process that greatly speeds up plant breeding, and this requires seed sorting. Haploid kernels are induced in corn plants in order to decrease the time to reach homozygous genetic corn lines. These haploid kernels must be separated from the surrounding diploid kernels; presently this is labor intensive and performed using visual markers. This current work represents a proof of concept study which sought to determine if haploid classification can be automated using terahertz time domain spectroscopy (THz-TDS) with data analysis paired with a machine learning algorithm, such as a probabilistic neural network (PNN). In this work, a THz-TDS system was used to collect time domain waveforms from a sample of mixed haploid and diploid corn kernels. Effects of variabilities in beam focus and kernel geometry were reduced by taking multiple scans at different heights. The waveform data were then transformed to the frequency domain and further classified by PNN with a training set random subsampling technique. Leave-one-out and K-folds cross-validation procedures were used to train the model. The preliminary results show promise yielding an average classification rate of 75 percent correct by 5-fold cross-validation.