|FU, PENG - University Of Illinois|
|MEACHAM-HENSOLD, KATHERINE - University Of Illinois|
|GUAN, KAIYU - University Of Illinois|
Submitted to: Frontiers in Plant Science
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 5/16/2019
Publication Date: 6/3/2019
Citation: Fu, P., Meacham-Hensold, K., Guan, K., Bernacchi, C.J. 2019. Hyperspectral leaf reflectance as proxy for photosynthetic capacities: An ensemble approach based on multiple machine learning algorithms. Frontiers in Plant Science. 10:730.
Interpretive Summary: Photosynthesis is critical for crop production, leading to the need for food, feed, fiber and biofuel. Efforts to characterize photosynthesis for a variety of crops, genotypes, cultivars, and in a range of environmental conditions is difficult as traditional methods for measuring photosynthesis are slow, labor intensive, and expensive. Recently, high-throughput technique that allow for relatively inexpensive and fast measurements are being developed, but there are many uncertainties related to how well these methods work. In particular, using high-end measurements of leaf reflectance allows for machine-learning techniques to better estimate photosynthesis. However, all of these techniques rely on only one machine learning algorithm. This research investigates a variety of machine learning approaches to assess the ability of leaf reflectance measurements to better measure photosynthesis, and then determines whether stacking the multiple approaches into on large statistical technique is better than any one method by itself. We show that the better predictive performance of the stacking approach is best and that the diverse ability of each individual regression technique resulted in the best modeling performance. This work advances the strength of rapid measurements of photosynthesis that can be applied to a wide range of species and environments.
Technical Abstract: Increasing demands for food, fiber, and fuel caused by rising human population and higher living standards may not be satisfied with the world’s agriculture production stressed by a changing climate. This conundrum can be alleviated by providing highly photosynthetically efficient crop cultivars to farmers. Current research efforts to increase photosynthetic energy conversion efficiency has produced a wealth of photosynthetic information at genomic and molecular levels, which are yet to be linked to phenotype in a real-world environment efficiently. Though partial least squares regression (PLSR) has been commonly used to relate hyperspectral reflectance to photosynthetic parameters, its modeling performance varies significantly across different plant species, regions, and growth environments. Thus, to cope with the heterogenous performances of PLSR among different situations, this study aims to develop a new approach to estimating photosynthetic parameters. We developed a framework by combining six machine learning algorithms, including artificial neural network (ANN), support vector machine (SVM), least absolute shrinkage and selection operator (LASSO), random forest (RF), Gaussian process (GP), and PLSR. Six tobacco genotypes including both transgenic and wild type lines were used to test the developed approach. Leaf reflectance of the six genotypes were measured from 400-2500 nm using a high-spectral-resolution spectroradiometer. The corresponding response of photosynthesis to intercellular CO2 concentration for each leaf was captured using a portable leaf gas exchange system. Results suggested that the mean R2 value of the six regression techniques for predicting Vcmax (Jmax) in the calibration phase ranged from 0.60 (0.45) to 0.65 (0.56) with the mean RMSE value varying from 47.1 (40.1) to 54.0 (44.7) µmol m^(-2) s^(-1). The regression stacking exhibited a better performance than each individual regression technique for predicting both Vcmax and Jmax. An improvement in the R2 value of 0.1 (0.08) and a decrease of the RMSE value by 4.1 (6.6) µmol m^(-2) s^(-1) were provided by the stacking for predicting Vcmax (Jmax). It was concluded that the better predictive performance of the regression stacking should be attributed to the varying coefficients (or weights) in the level-2 model (the LASSO model) and the diverse ability of each individual regression technique to utilize spectral information for the best modeling performance. Further refinements can be made to understand the portability of the stacked regression to estimate other plant phenotypic traits.