Skip to main content
ARS Home » Southeast Area » Stoneville, Mississippi » Genomics and Bioinformatics Research » Research » Research Project #434717

Research Project: Applied Agricultural Genomics and Bioinformatics Research

Location: Genomics and Bioinformatics Research

2021 Annual Report

1. Advance and accelerate translational research for ARS and its collaborators that addresses the agricultural needs of primarily the Southeast region, through data generation, data analysis, and data management, with an emphasis on genomic approaches and on crop, animal, insect, and microbiome analyses; support germplasm analysis for breeding and for trait genetic and molecular analyses; and support gene expression analysis and gene discovery. 1.A. A cross section of GBRU operations in genomics and bioinformatics. 1.B. Specific ongoing collaborative projects. 1.C. Data Management. 2. Accelerate ARS bioinformatics community development and capacity building, primarily for the Southeast region, through training workshops, webinars, and direct project participation; develop and evaluate new tools, workflows, and systems that enable ARS and its collaborators to more efficiently manage, analyze, and share diverse streams of biological data and knowledge, including high throughput genotyping and phenotyping, thereby enhancing crop and animal genetic improvement, health, and nutrition. 2.A. Bioinformatics community development and capacity building. 2.B. Development of new tools and procedures.

The Genomics and Bioinformatics Research Unit’s (GBRU) primary function is conducting research in the areas of bioinformatics and genomics on a wide array of species and topics. Genomic technologies are powerful tools for germplasm improvement using marker assisted selection (MAS), biotechnology, or synthetic biology, and for analyzing associated biological processes (genetics, physiology, cell and molecular biology, biochemistry, and evolutionary biology). Thus, many ARS scientists, e.g., crop and animal breeders, have a direct need for genomic tools in their research. Others, e.g., soil scientists, can enhance their research dramatically using genomic tools to analyze the microbiome, if the technologies and appropriate expertise are available. However, not all ARS locations have sufficient resources to support core genomic technologies. Thus, the mission of the ARS Genomics and Bioinformatics Research Unit (GBRU), is to: (1) coordinate, facilitate, collaborate and conduct genomics and bioinformatics research emphasizing the Southeast region; (2) serve as a research and training resource for genomic technologies and bioinformatic analyses in support of ARS scientists and their collaborations; and (3) serve as a technical resource for ARS research programs that have not typically utilized these technologies, and aid in their development of genomic resources. Within the GBRU, this research project will conduct and collaborate on genome sequencing, sequence assembly and analysis, diversity analysis, marker development, haplotyping, physical and genetic map production, and transcription profiling research. Thus, essential product development includes new and improved reference genomes for plants, animals, insects, fish, and microbes that enable genomics-assisted breeding; new physical and genetic maps; improved cultivars, germplasm, or breeding lines; and new information on key agricultural problems such as disease resistance and drought tolerance.

Progress Report
Service and research efforts have continued during the past year in all areas of genomics and bioinformatics with several members of the unit participating in national ARS initiatives of SCINet (high performance computing), Ag100Pest (producing high-quality reference genomes for 100 insects), and AI (artificial intelligence). The unit has helped lead multiple aspects of the Ag100Pest project, generating high-quality genomes for important insect pests including the desert locust (causing plague in Africa and sub-Asian continents), food storage pests, crop pests and other immediate pests of interest. SCINet resource components were enhanced by the unit. The unit established a new communications forum fostering communication between SCINet users and developed the SCINet Apps System for ARS scientists to more easily host their web applications by having SCINet provide computing and administrative approvals, for example, The unit has a leadership role in establishing the ARS Artificial intelligence Center of Excellence and designing its initial grant call which awarded four proposals in FY21. The first genome reference sequence for evergreen blueberry was developed. With high-quality versions of both chromosome copies, it was possible to study the genes across different fruit developmental stages indicated that genes in photosynthesis were turned off as fruit ripened while genes involved in flavonoid and making anthocyanins were turned on, especially in ripe berries. Wild peanut species were targeted for genome sequencing because they are valuable sources of resistance to diseases and pests that cost U.S. peanut producers millions of dollars each year in crop losses and Plant Protection Products. The cultivated peanut (Arachis hypogaea) (known as groundnut in other parts of the world) is notorious for its lack of genetic diversity, which makes it particularly susceptible to ever-evolving pathogens. The wild species sequenced – A. stenosperma, A. cardenasii, two A. duranensis, and an improved A. ipanensis - have genes that offer exceptional resistance to leaf spots, rust disease, root-knot nematode, viruses, and other diseases. The sequencing data will help peanut breeders develop new varieties that are better able to combat pests and diseases. Loci associated with traits of interest in hydrangea flower types were identified using controlled cross populations, these results were published, populations are being used to look at other traits of interest like reblooming. The structure of a USDA-ARS cotton breeding program, the Pee Dee Cotton Germplasm Enhancement Program, was analyzed with high-throughput genotyping to understand the family structure of individuals important for the history of the program. This analysis along with other genome sequencing is being used to further understand the history of cotton breeding in the United States, as well as work towards developing genomic selection models for integrating into public cotton breeding programs. The egg industry culls billions of day-old male chicks annually. Identifying male eggs before the embryos become sentient would improve animal welfare and reduce industry costs. The Unit published a patent application on a method to determine the sex of chicken eggs by applying statistical machine learning methods to data on the odors from eggs. The Unit has continued to refine the method to increase the speed and accuracy of sex determination to the point where the technology can be licensed to egg equipment manufacturers. The Unit designed software called Guidemaker that rapidly designs the short pieces of RNA needed to mutate genes at many specific points. Guidemaker is the first software to make this new gene discovery method more accessible. Discovering gene function can help create new products to control or use bacteria. Traditionally scientists have created a mutation (change) in one gene to make it non-functional and observe the effect. Newer methods can mutate one random gene in millions of different bacterial cells at the same time in a single test tube. That mix of mutated bacteria can then be grown together. By DNA sequencing the mix at the beginning and end of growth the genes that are required for growth can be found. A new gene-editing technology called CRISPR/Cas can be used to mutate the genes at many specific points, which is more efficient than randomly mutating them. Actual pan-genomic data – not limited by a single reference genome – are now on the horizon thanks to innovations in long-read sequencing technologies. Unfortunately, the bioinformatic tools to fully explore these pan-genomes have not kept pace with sequencing capacity. Therefore, a computational framework for characterizing genetic mapping populations in a pangenomic context was developed. Working with USDA cantaloupe researchers at the U.S. Vegetable Laboratory, these tools have been employed to explore various genotyping strategies enabled by this pangenomic approach. A new USDA-ARS initiative led by the unit is Breeding Insight OnRamp, which is designed to help selected ARS specialty crop and animal breeding programs get ready to join Breeding Insight. It’s a readiness program that helps organize breeders, create common traits and methods, and curate historical data so that when programs enter Breeding Insight they are prepared to benefit from the services immediately. BIOnRamp continues to support multiple commodities including blueberry, sugarcane, and citrus.

1. Genomic characterization of U.S. rice germplasm collections and deleterious load. Some crop varieties have superior performance across years and environments. In hybrids, harmful mutations in one parent are masked by an intact gene in the other parent, resulting in increased vigor. Unfortunately, these mutations are very difficult to identify precisely because, individually, they only have a small effect. For self-fertilizing crops, the century-spanning U.S. germplasm collection is an invaluable resource for understanding how selection for yield affects these different mutations. The knowledge resulting from analysis of such data can be used by ARS researchers in Stoneville, Mississippi, and by breeders to increase genetic gain from whole-genome marker information. Long-read sequencing information was used to characterize the entire mutational spectrum between two rice varieties. These mutations were tracked through the last century of rice breeding and showed that large structural mutations in exons are selected against at a greater rate than any other mutational class. These findings illuminate the nature of deleterious alleles and will guide attempts to predict variety vigor based solely on genomic information.

2. A high-quality spinach genome that closely represented the chromosome structure of the plant allowed for studying whole genome duplications in the euasterid group of flowering plants. This plant species group is not well studied as most of our important crop plants fall under the other group, asterids. The spinach genome showed ARS researchers in Stoneville, Mississippi, that the genome of this plant had undergone many duplications in the past (many millions of years ago) that were then followed by extensive gene rearrangements. This evolutionary history had previously been difficult to analyze and identify without a high-quality reference genome. Additionally, 75 spinach lines were whole-genome sequenced, identifying variants across the diversity of spinach. One important finding was that spinach germplasm is maintained as collections and this needs to be included in designing experiments and performing genetic analyses, especially when looking to identify genes responsible for a trait.

Review Publications
Vaughn, J.N., Korani, W., Stein, J.C., Edwards, J., Peterson, D.G., Simpson, S.A., Youngblood, R.C., Grimwood, J., Ware, D., Mcclung, A.M., Scheffler, B.E. 2021. Gene disruption by structural mutations drives selection in US rice breeding over the last century. PLoS Genetics. 17(3): e1009389.
Winders, J.R., Pechan, T. 2021. Comparison of homogenization methods for extraction of maize cob metabolites. African Journal of Biotechnology. 20(3):108-114.
Pan, Z., Bajsa Hirschel, J.N., Vaughn, J.N., Rimando, A.M., Baerson, S.R., Duke, S.O. 2021. In vivo assembly of sorgoleone biosynthetic pathway and its impact on agroinfiltrated leaves of Nicotiana benthamiana. New Phytologist. 230:683-697.
Gao, G., Magadan, S., Waldbieser, G.C., Youngblood, R., Wheeler, P., Scheffler, B.E., Thorgaard, G., Palti, Y. 2021. A long reads-based De novo assembly of the genome of the Arlee homozygous line reveals structural genome variance in rainbow trout. Genes, Genomes, and Genomics.
Fernandez-Baca, C.P., Rivers, A.R., Kim, W., McClung, A.M., Roberts, D.P., Reddy, V., Barnaby, J.Y. 2021. Changes in rhizosphere soil microbial communities across plant developmental stages of high and low methane emitting rice genotypes. Soil Biology and Biochemistry.
Wu, X., Hulse-Kemp, A.M., Wadl, P.A., Smith, Z., Mockaitis, K., Staton, M.E., Rinehart, T.A., Alexander, L.W. 2021. Genomic resource development for hydrangea (Hydrangea macrophylla (Thunb.) Ser.) – A transcriptome assembly and a high-density genetic linkage map. Horticulturae.
Fernandez-Baca, C.P., Rivers, A.R., Maul, J.E., Kim, W., McClung, A.M., Roberts, D.P., Reddy, V., Barnaby, J.Y. 2021. Rice plant-soil microbiome interactions driven by root and shoot biomass. Diversity.
Billings, G., Jones, M., Rustgi, S., Hulse-Kemp, A.M., Campbell, B.T. 2021. Population structure and genetic diversity of the Pee Dee cotton breeding program. Genes, Genomes, Genetics.
Nowicki, M., Hadziabdic-Guerry, D., Trigiano, R.N., Boggess, S.L., Kanetis, L., Wadl, P.A., Ojiambo, P.S., Cubeta, M.A., Spring, O., Thines, M., Runge, F., Scheffler, B.E. 2021. ‘Jumping Jack’: Genomic microsatellites underscore the distinctiveness of closely related Pseudoperonospora cubensis and Pseudoperonospora humuli and provide new insights into their evolutionary past. Molecular Plant Pathology. 12:686759.
Park, S., Scheffler, J.A., Ray, J.D., Scheffler, B.E. 2021. Identification of simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) that are associated with nectariless trait of Gossypium hirsutum L.. Euphytica. 217:78.
Hulse-Kemp, A.M., Bostan, H., Chen, S., Ashrafi, H., Iorizzo, M., Van Deynze, A. 2021. An anchored chromosome-scale genome assembly of spinach (Spinacia oleracea) improves annotation and reveals extensive gene rearrangements in euasterids. The Plant Genome. 14(2):e20137.