Location: Animal Parasitic Diseases Laboratory
Project Number: 8042-32000-116-008-I
Project Type: Interagency Reimbursable Agreement
Start Date: Jun 1, 2025
End Date: Nov 30, 2026
Objective:
Objective 1. Developing an Artificial Intelligence (AI) Pipeline and machine learning models to improve agricultural production for America's farmers for use in high-accuracy lab tests called real-time PCR and CRISPR, which are methods that increase or edit DNA, for detecting and discovering existing and emerging strains of Tomato yellow leaf curl virus (TYLCV, family Geminiviridae, genus Begomovirus). Tomato yellow leaf curl disease is a disease of high importance in agriculture, since it has the potential to limit yield production up to 100% in tomato, pepper, beans, and cucurbits plants. The goal is to develop an AI-based Pipeline for discovering signature sequences for specific detection of the TYLCV virus which currently include approximately 65 species. This AI-based pipeline will be integrated on an AI-based platform for virus detection. In addition, a Panel-based sub-group detection of the virus using a lab test called TaqMan quadruplex real-time PCR will be used.
Objective 2. Build a simple, easy-to-use test using CRISPR technology that can tell the exact type of this Tomato yellow leaf curl virus.
Objective 3. Test, confirm and validate that our AI strategy and technical approach, real-time PCR, and CRISPR tools really work to find and identify types of the Tomato yellow leaf curl virus.
Approach:
The Artificial Intelligence-based (AI) strategy and technical approach will be based on feature extraction utilizing N-gram frequencies and/or mer-based features such as dinucleotide, trinucleotide, and tetranucleotide frequencies to capture unique patterns and signatures between TYLCV exotic species. Machine learning models will utilize the Scikit-learn package to build machine learning models using Neural Networks (Effective for learning complex patterns and relationships in genomic data) and the Random Forest algorithm for classification(Malamon et al 2024, Dlamini et al 2020) and clustering algorithms. Labels will be determined by unsupervised learning. Curated datasets will be constructed for the machine learning process. This involves creating at training set, validation set, and testing set for supervised learning. The training set will be used to train the model, the validation set will be used to validate the model training, and the testing set will be used to evaluate the model on unseen data using Neural Networks and deep learning and Random Forrest. Hyperparameter will also be performed. The deliverable will be machine learning models that can distinguish TYLCV virus groups and labels.
This AI strategy will aide in the development of sub objectives 2 and 3, the development of panel and species-specific screening assays. For them AI-based Tomato leaf curl virus (exotic species) specific signature detection, we will use all begomoviruses along with closely related virus species. Panel-based detection for a sub-group of Geminivirus: TYLCV using panel-specific primers. Based on objective 1, we will use signature sequences to classify separate sub-groups. The precision and accuracy of the assay will be determined by serially spiking TYLCV viruses into healthy tomato, peppers, beans, and cucurbits DNA samples. For assay development, we will introduce viral targets into DNA samples derived from healthy tomato, pepper, bean, and cucurbit plants. Synthetic DNA corresponding to Tomato yellow leaf curl virus (TYLCV) will be used and sourced from a reputable commercial provider.
Sub objectives 1 and 2 will provide us with a positive sub-group of Geminivirus: TYLCV (exotic species). The idea is to use ~65 species to bring down very few positive species for diagnostic CRISPR assays. CRISPR-based diagnostics is the single-nucleotide (SNP level) specificity of Cas enzymes, which enables the detection of point mutations (Kaminski et al 2021). We will use this technique to detect, and species level differentiate the sub-group from objective 2 of the TYLCV. Develop the RPA-CRISPR-Cas12 assays for detecting the Geminivirus: TYLCV(exotic species) samples will be obtained from collaborating laboratories, and a synthetic approach will be used for executing the CRISPR-Cas12 assays. We will synthesize primers and crRNA through IDT and obtain Cas12a enzyme from NEB. Assay optimization, we will optimize the amounts of gRNA, Lbcas12a, and the ssDNA reporter. Further validation will test the sensitivity, specificity, and selectivity of the methods to determine the reproducibility of the CRISPR assay using actual target virus species.