Location: Molecular Plant Pathology Laboratory
Project Number: 8042-22000-320-005-I
Project Type: Interagency Reimbursable Agreement
Start Date: Aug 1, 2023
End Date: Jul 31, 2024
This project aims to collect image data of healthy and phytoplasma-infected plants, train a machine learning algorithm to identify phytoplasma diseases based on morphological features, and evaluate the model's performance. Objective 1: Collect a dataset of high-quality images of both phytoplasma-infected and healthy plants, preprocess the image data, and generate a database. This dataset will be used to train the machine learning algorithm. Objective 2: Choose a machine learning algorithm for AI-based symptom image comparison (image classification task). The training model will learn to identify the presence of phytoplasma diseases in plants based on their morphological changes that match phytoplasma disease symptoms. Objective 3: Evaluate the trained model on a hold-out dataset to assess its accuracy, precision, and recall rate; Fine-tune the model and improve its performance.
The first Objective of the proposed project aims to (1) collect a large and diverse dataset of images of both healthy and phytoplasma-infected plants, including a variety of plant species, symptoms, and disease severity levels. The existing Phytoplasma Disease Database constructed by this group has identified nearly 700 plant species that are susceptible to phytoplasma infections. Based on this identified list, images of morphological features of healthy and phytoplasma-infected plants will be collected from open source and merged with our existing image resources. The images will be stored in a structured format and annotated with relevant metadata, such as the type of plant, the growth stage, and the presence of phytoplasma infection; (2) Clean and preprocess the collected image data to ensure that it is suitable for training the model. This includes resizing the images to a uniform size, normalizing the brightness and contrast of the images, and augmenting the data by rotating the images in 90-degree increments; (3) Generate a database that includes the preprocessed images, along with the corresponding annotations, that will be used to train and test the machine learning algorithm. The training and validation datasets produced in this study will be unique in that the images will include different parts of various plant species infected with phytoplasmas, such as deformed flowers, flattened stems, proliferated branches, sterile inflorescence, and fruit. Our second Objective aims at training a machine learning algorithm and creating image classification models for phytoplasma infection. In the second Objective, we will (1) Construct the model using pre-trained Google’s Inception v3 and the VGG-16 architecture image classifiers; Fine-tune the last layer of the respective networks on the datasets created in the proposed project; (2) Create the additional models by implementing our own custom convolutional neural network (CNN) Model in TensorFlow; (3) Train models on various hyperparameters and introduce noise where appropriate to improve model robustness; These trained models will be used to identify unknown input images and detect the presence of a phytoplasma disease. Our third Objective aims at evaluating the trained model(s). This Objective will (1) evaluate the trained models on the test set (unseen data) to assess their performance in accuracy, precision, recall, and F1 score; (2) Attempt to fool or trick the classifier to minimize both false positives and false negatives.