Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Plant, Soil and Nutrition Research » Research » Research Project #434435

Research Project: Improving Crop Efficiency Using Genomic Diversity and Computational Modeling

Location: Plant, Soil and Nutrition Research

2022 Annual Report


Objectives
Objective 1: Create approaches and tools for identifying causal variants directly from genomic sequencing of diverse germplasm and species of C4 crops. [NP301, C1, PS1A] Objective 2: Identify deleterious mutations, and model their impact on crop efficiency and heterosis in C4 crops. [NP301, C3, PS3A] Objective 3: Identify adaptive variants for drought and temperature tolerance across C4 crops. [NP301, C1, PS1B] Objective 4: Establish community tools for processing and integration of sequence haplotypes to estimate their breeding effects in crop productivity. [NP301, C4, PS4A]


Approach
Increasing grass crop productivity is key for feeding the world over the next 50 years and this will require removing the deleterious variants in every genome, as well as adapting the crops to highly variable and stressful environments. This project will build better breeding models for improving and adapting maize and sorghum by surveying the natural variation across their entire group of wild relative species - the Andropogoneae. With over 1,000 species, the Andropogoneae are the most productive and water-use efficient plants in the world. Yet, for applied purposes, we have only tapped the variation from a handful of species. This project will lead an effort to survey DNA-level variation across this entire clade and analyze the variation with statistical and machine learning approaches. This will allow us to develop two sets of applied models for maize and sorghum. First, we will quantitatively estimate the deleterious impact on yield for every nucleotide in the genome. Second, we will identify the genes with a high capacity for adaptation to drought, flooding, temperature tolerance and their properties. These approaches and models will be deployed via integration with big data bioinformatics. This project will produce DNA-level knowledge that can be used across breeding programs and crops, and applied through either genomic selection or genome editing.


Progress Report
This year we continued our genomic efforts, focusing on sequencing wild species in the maize and sorghum clade – the Andropogoneae. Working with USDA collaborators (Stoneville, Mississippi) and others, 40 species have been assembled to a high quality and annotation is nearly complete for most. We have also completed sequencing and annotation for 40 diverse maize inbred lines, selected because of their high diversity and/or importance in the Genomes to Fields and Germplasm Enhanced of Maize (GEM) projects. Finally, we have attempted short read DNA sequencing on 1,000 Andropogoneae samples from herbarium specimens and other sources, including many from the U.S. National Plant Germplasm System (NPGS). The quality of the DNA has been highly variable and most of the herbarium samples challenging, so we are trying to use ancient DNA approaches to fill in key species. Despite this, we have been able to assemble the gene space of over 350 genomes so far and identify the core sets of genes shared across the tribe. For the first time, we can now see how individual genes have evolved and adapted to various environments across hundreds of closely related species. The first application of this resource has been to resolve problems with the gene models of the key crops maize and sorghum. We have successfully created machine learning models that use evolutionary conservation across species to identify which genes are likely to produce functional proteins. In the case of maize, it now appears that 85% of the non-core genes in maize are likely pseudogenes. This is important, because now scientists can focus their attention on the 15% of genes that are likely functional but not shared across varieties. We have been wrapping up our analysis of deleterious variants in maize and (through collaborations) in other species. Our previous work has shown that deleterious mutations explain about half of the variation in crop yield between varieties. Our most powerful tool to identify deleterious mutation is the comparison of DNA variation within species to conservation across species. We continue to make progress on tools and pipelines to accurately compare DNA variation with and between species with the publication of two papers on these approaches this year. Using these alignments, we have now been able to identify deleterious mutations with great precision and show that we can improve genome wide prediction in studies in three species – maize, tomato, and cassava. These improvements in accuracy are most important when using information across populations. In maize and cassava, we have gone even further and used protein structure machine learning calibrated against evolutionary conservation machine learning to further increase prediction accuracy. In the coming year, this work will be pushed further with the high accuracy structures being determined with AlphaFold2. We also tested two other hypotheses regarding deleterious genetic load. First, transposons are genomic parasites that can occupy over 85% of a genome, and there has been much debate over the importance re their impact on the fitness of the host. While there is no doubt that active transposition can produce an occasional deleterious effect, the question was whether there is a bulk effect of transposons that is deleterious. In the largest analysis of its kind, we have shown that while maize transposons are significantly slightly deleterious – it is only slight, explaining less than 1% of yield variation. The removal of transposons from genomes will not benefit applied agriculture. Second, pleiotropy is when one genetic variant affects two unrelated traits; there are arguments that pleiotropic constraints on the genetic architecture of traits play important roles in productivity. Through a massive analysis of maize QTL looking for pleiotropy at the level of the field, metabolite, and RNA expression, there is very little evidence of pleiotropic effects in common standing genetic variation. Again, this suggests most deleterious mutations are likely to be rare variants directly affecting RNA expression, translation, and protein structure. Our bioinformatic focus is aimed at making tools more useable by molecular breeders while also facilitating repeatable science. We are accomplishing this with three approaches: (1) We released a new version of TASSEL (our tool for analysis of genetic diversity which is used in over 800 studies annually), and we have developed user-friendly ways to integrate it with the R statistical environment that is frequently used by breeders (rTASSEL). (2) Our Practical Haplotype Graph (PHG), a powerful way to represent the haplotype diversity of a crop, has now been developed for 5 major crops. We enhanced the PHG so that it works with the Breeder API (BrAPI) which is the global standard for sharing germplasm, phenotype, and genotype data. Through this BrAPI standard, PHG haplotypes and genotypes are now accessible through the R environment via rPHG, a package we developed for applied researchers. (3) Setting up the computing environment necessary to do genomic analysis and genome wide prediction can be challenging for many applied breeders. We developed a prototype Breeder Genomics Hub – which is a Jupyter Hub that supports the scripting languages used by both breeders (R) and genomicists (Python). We are testing and teaching this hub to breeders this fall. Fundamentally, this project is beginning to shift its effort from understanding the basis of quantitative genetic variation to the development of applied models and evolutionary datasets applied to develop a new Circular Agricultural System in the US. Breeding Insight (BI) is the ARS initiative to increase the adoption of genomics, phenomics, and analytics tools (including data management software) in ARS specialty crop and animal breeding programs, which have lagged behind major crop and animal breeding programs. BI is currently in year 4 (phase II) and its sister program, BI OnRamp, is in year 2. Together, BI and OnRamp provide breeding support services for 19 ARS species (blueberry, table grape, sweet potato, alfalfa, rainbow trout, and North American Atlantic salmon, honeybee, strawberry, cranberry, oat, pecan, lettuce, cucumber, sorghum, hemp, citrus, sugarcane, soybean, and cotton), with BI providing support to multiple breeder programs for some species. The future goal is expansion out to all ARS specialty crops, animal, and natural resource breeding programs. As COVID restricted travel until March of 2022, BI focused on 1) onboarding the new species admitted as part of phase II and preparing the 2022 timeline of deliverables for each species, 2) using the custom 3K marker panels for blueberry and alfalfa for routine genotyping needs for these breeding programs across multiple ARS locations and creation of new 3K marker panels for cucumber, lettuce, and sweetpotato, 3) deployment of mobile phenotyping apps Field Book and Smatrix systems for phenotyping in the 2022 season, 4) partial historical breeding data was loaded for salmon, grape, sorghum, cranberry, blueberry, lettuce, cucumber, citrus, and sugarcane into their own BreedBase instances (ongoing effort), and 5) Once COVID restrictions were lifted, BI scheduled and arranged in-person site visits with each species. As of July 2022, BI has traveled to 8 different breeding program sites with the goal of visiting 6 more before the end of the year. Other accomplishments of smaller note include efforts to create an image analysis pipeline for phenotypic data extraction (ongoing), successful deployment of voice-to-text digital data collection workflow (with Smatrix mobile app) for animal welfare data logging to replace paper records, creation and maintenance of an active Learning Hub on BI’s website where training material is housed and publicly available for any breeder to learn about the technologies and services that BI provides, and BI has taken an elevated role in feature requests to Field Book, Smatrix Systems, BreedBase, and BrAPI. BI’s third significant software development accomplishment is the release of version 0.6, which includes germplasm loading functionalities and allows for the expansion of the trait ontology to more general ontology functions such as indicating events and for describing the trial environment. The software team has made major improvements to the back-end communications between Field Book and BreedBase through the Breeding API (BrAPI) connection, pushing for BrAPI expansion when necessary. The difficulties experienced by BI staff when importing historical data into BreedBase (while remaining BrAPI compliant) prompted the IT team to create a better and more flexible import/export solution for breeders. Working prototypes of this import tool are under refinement at BI and will be used by BI coordinators to hasten the loading of any type of data into BreedBase. As with all BI’s software, it will be BrAPI 2.1 compliant, open-source, and publicly available. In addition to software development, the IT team manages over 30 servers and databases to support BI’s software, BreedBase instances for each species under the Breeding Insight umbrella, and development servers to test new features. This management is done with the help of automated deployment pipelines that the IT team has created, where a new software release can be deployed to all servers in 30 minutes.


Accomplishments
1. Combining evolution and protein structure identifies deleterious mutations in maize impacting yield. Maize has 37,000 genes that interact together to produce the world’s highest yielding crop, and natural mutations and disruptions keep the crop from meeting its genetic potential. Combining new machine learning models for protein structure with evolutionary comparisons of maize with other plants, ARS researchers in Ithaca, New York, (along with collaborators) have identified those individual mutations affecting yield and used them to improve yield prediction in maize hybrids. In collaboration with colleagues in cassava, this same approach also works in yield prediction, and we expect similar models to be applied to all crops where yield is the primary trait.

2. Breeding Insight expands access to field and genomic tools for 19 specialty crops and animal species. One of the major challenges in breeding is the integration and processing the billions of genomic and field data points needed to make informed decisions. Breeding Insight (a USDA-ARS cooperative agreement with Cornell University) expanded support to double the number of crops in the program with the establishment of data management systems for all species, new genomic tools are available for a quarter of the species, and field informatics tools for three-quarters of them. Many of the specialty crops (like blueberry and alfalfa) have genome duplications that make genomics tools challenging to apply, but this year the teams were able to apply these tools to these complex genomes. Putting these powerful analyses and genomic tools into the hands of ARS’s excellent specialty crop and animal breeders helps to improve breeding decisions and to meet public demand for more sustainable, nutritious, and flavorful foods.


Review Publications
Zhang, X., Zhu, Y., Kremling, K.A., Romay, C., Bukowski, R., Sun, Q., Gao, S., Buckler Iv, E.S., Lu, F. 2021. Genome-wide analysis of deletions in maize population reveals abundant genetic diversity and functional impact. Theoretical and Applied Genetics. https://doi.org/10.1007/s00122-021-03965-1.
Long, E.M., Bradbury, P., Romay, C.M., Buckler IV, E.S., Robbins, K.R. 2021. Genome-wide imputation using the practical haplotype graph in the heterozygous crop cassava. G3, Genes/Genomes/Genetics. 12(1):jkab383. https://doi.org/10.1093/g3journal/jkab383.
Song, B., Marco-Sola, S., Moreto, M., Johnson, L., Buckler IV, E.S., Stitzer, M.C. 2021. AnchorWave: sensitive alignment of genomes with high sequence diversity, extensive structural polymorphism, and whole-genome duplication. Proceedings of the National Academy of Sciences (PNAS). 119(1). Article e2113075119. https://doi.org/10.1073/pnas.2113075119.
Pignon, C.P., Fernandes, S.B., Valluru, R., Bandillo, N., Lozano, R., Buckler IV, E.S., Gore, M.A., Long, S.P., Brown, P.J., Leakey, A. 2021. Phenotyping stomatal closure by thermal imaging for GWAS and TWAS of water use efficiency-related genes. Plant Physiology. 184(4):2544-2562. https://doi.org/10.1093/plphys/kiab395.
Wu, Y., Johnson, L., Song, B., Romay, M.C., Stitzer, M., Siepel, A., Buckler IV, E.S., Scheben, A. 2022. A multiple alignment workflow shows the effect of repeat masking and parameter tuning on alignment in plants. The Plant Genome. 15(2). Article e20204. https://doi.org/10.1002/tpg2.20204.
Gage, J.L., Mali, S., McLoughlin, F., Khaipho-Burch, M., Monier, B., Bailey-Serres, J., Vierstra, R.D., Buckler IV, E.S. 2022. Variation in upstream open reading frames contributes to allelic diversity in maize protein abundance. Proceedings of the National Academy of Sciences (PNAS). 119(14). Article e2112516119. https://doi.org/10.1073/pnas.2112516119.
Bradbury, P.J., Casstevens, T., Jensen, S.E., Johnson, L.C., Miller, Z.R., Monier, B., Romay, M.C., Song, B., Buckler IV, E.S. 2022. The Practical Haplotype Graph, a platform for storing and using pangenomes for imputation. Bioinformatics. https://doi.org/10.1093/bioinformatics/btac410.
Dafna, A., Halperin, I., Oren, E., Isaacson, T., Tzuri, G., Meir, A., Schaffer, A.A., Burger, J., Tadmor, Y., Buckler IV, E.S., Gur, A. 2021. Underground heterosis for yield improvement in melon. Journal of Experimental Botany. 72(18):6205-6218. https://doi.org/10.1093/jxb/erab219.
Ferguson, J.N., Fernandes, S.B., Monier, B., Miller, N.D., Allan, D., Dmitrieva, A., Schmuker, P., Lozano, R., Valluru, R., Buckler IV, E.S., Gore, M.A., Brown, P.J., Spalding, E.P., Leakey, A.D. 2021. Machine learning-enabled phenotyping for GWAS and TWAS of WUE traits in 869 field-grown sorghum accessions. Plant Physiology. 187(3):1481-1500. https://doi.org/10.1093/plphys/kiab346.
Baseggio, M., Murray, M., Wu, D., Ziegler, G., Kaczmar, N., Chamness, J., Hamilton, J.P., Buell, R.C., Vatamaniuk, O.K., Buckler IV, E.S., Smith, M.E., Baxter, I., Tracy, W.F., Gore, M.A. 2021. Genome-wide association study suggests an independent genetic basis of zinc and cadmium concentrations in fresh sweet corn kernels. G3, Genes/Genomes/Genetics. 11(8). https://doi.org/10.1093/g3journal/jkab186.
Willcox, M.C., Burgueño, J.A., Jeffers, D., Rodriguez-Chanona, E., Guadarrama-Espinoza, A., Kehel, Z., Chepetla, D., Shrestha, R., Swarts, K., Hearne, S., Buckler IV, E.S., Chen, N.C. 2022. Mining alleles for tar spot complex resistance from CIMMYT's maize germplasm bank. Frontiers in Sustainable Food Systems. 6:937200. https://doi.org/10.3389/fsufs.2022.937200.
Oren, E., Tzuri, G., Dafna, A., Reese, E.R., Song, B., Freilich, S., Elkind, Y., Isaacson, T., Schaffer, A.A., Tadmor, Y., Burger, J., Buckler IV, E.S., Gur, A. 2022. QTL mapping and genomic analyses of earliness and fruit ripening traits in a melon recombinant inbred lines population supported by de novo assembly of their parental genomes. Horticulture Research. https://doi.org/10.1093/hr/uhab081.
Lozano, R., Gazave, E., Dos Santos, J., Stetter, M.G., Valluru, R., Bandillo, N., Fernandes, S., Brown, P.J., Shakoor, N., Mockler, T., Cooper, E., Perkins, T., Buckler IV, E.S., Ross-Ibarra, J., Gore, M.A. 2021. Comparative evolutionary genetics of deleterious load in sorghum and maize. Nature Plants. 7:17-24. https://doi.org/10.1038/s41477-020-00834-5.
Barnes, A.C., Rodríguez-Zapata, F., Juárez-Núñez, K.A., Gates, D.J., Janzen, G.M., Kur, A., Wang, L., Jensen, S.J., Estévez-Palmas, J.M., Crow, T.M., Kavi, H.S., Pil, H.D., Stokes, R.L., Knizner, K.T., Aguilar-Rangel, M.R., Demesa-Arévalo, E., Skopelitis, T., Pérez-Limón, S., Stutts, W.L., Thompson, P., Chiu, Y., Jackson, D., Muddiman, D.C., Fiehn, O., Runcie, D., Buckler Iv, E.S., Ross-Ibarra, J., Hufford, M.B., Sawers, R.J., Rellán-Álvarez, R. 2022. An adaptive teosinte mexicana introgression modulates phosphatidylcholine levels and is associated with maize flowering time. Proceedings of the National Academy of Sciences (PNAS). 119(27). Article e2100036119. https://doi.org/10.1073/pnas.2100036119.
Lozano, R., Booth, G.T., Omar, B., Li, B., Buckler IV, E.S., Lis, J.T., Pino Del Carpio, D., Jannink, J. 2021. RNA polymerase mapping in plants identifies intergenic regulatory elements enriched in causal variants. Genes, Genomes, Genetics. jkab273. https://doi.org/10.1093/g3journal/jkab273.
Giri, A., Khaipho-Burch, M., Buckler IV, E.S., Ramstein, G.P. 2021. Haplotype associated RNA expression (HARE) improves prediction of complex traits in maize. PLoS Genetics. https://doi.org/10.1371/journal.pgen.1009568.