Location: Genomics and Bioinformatics Research
2024 Annual Report
Objectives
1. Advance and accelerate translational research for ARS and its collaborators that address the agricultural needs of the Southeast region and ARS, through data generation, data integration and analysis, with an emphasis on ‘omics and machine learning approaches in crops, animals, insects, and microbiomes; support germplasm analysis for breeding and for trait genetic and molecular analyses; and support gene expression analysis and gene discovery.
1.A. Supplying bridge services in genomics and bioinformatics.
1.B. Translating standard genomic tools to outlier and non-model genetic systems.
2. Accelerate the integration of bioinformatics and advanced technologies in research, for the Southeast region and ARS, through direct project collaboration; develop and evaluate new tools, workflows, and systems that enable ARS and its collaborators to more efficiently manage, integrate, analyze, and share diverse streams of biological data and knowledge, including high throughput genotyping and phenotyping, thereby enhancing crop and animal genetic improvement, health, and nutrition.
2.A. Developing bioinformatic capacity that supports universal resource utility.
2.B. Pangenomic and phenomic data integration.
Approach
Genomic technologies are powerful tools for germplasm improvement using marker assisted selection (MAS), biotechnology, or synthetic biology, and for analyzing associated biological processes (genetics, physiology, cell and molecular biology, biochemistry, and evolutionary biology). Thus, many ARS scientists, e.g., crop and animal breeders, have a direct need for genomic tools in their research. Others, e.g., soil scientists, can enhance their research dramatically using genomic tools to analyze the microbiome, if the technologies and appropriate expertise are available.
The Genomics and Bioinformatics Research Unit’s (GBRU) primary function is conducting research in the areas of bioinformatics and genomics on a wide array of species and topics. GBRU also provides collaborative assistance with various ARS project that are constrained by routine genomics or bioinformatics hurdles.
Not all ARS locations have sufficient resources to support core genomic technologies. Thus, some specific roles of GBRU are to: (1) coordinate, facilitate, collaborate and conduct genomics and bioinformatics research emphasizing the Southeast region; (2) serve as a research and training resource for genomic technologies and bioinformatic analyses in support of ARS scientists and their collaborations; and (3) serve as a technical resource for ARS research programs that have not typically utilized these technologies, and aid in their development of genomic resources.
Within the GBRU, this research project will conduct and collaborate on genome sequencing, sequence assembly and analysis, diversity analysis, marker development, haplotyping, physical and genetic map production, and transcription profiling research. To provide sequencing and analysis for polyploids, clonally-propagated cultivars, historically resource limited systems, and other edge cases for which standard bioinformatic protocols are problematic. In part this will be possible through exploring and advocating universal and reproducible bioinformatic approaches for researchers engaged in genome-wide or high-throughput experiments, and by connecting phenotypic information with genotypic and genomic data in a coherent way that supports germplasm utilization and gene discovery.
Thus, essential product development includes new and improved reference genomes for plants, animals, insects, fish, and microbes that enable genomics assisted breeding; new physical and genetic maps; improved cultivars, germplasm, or breeding lines; and new information on key agricultural problems such as disease resistance and drought tolerance.
Progress Report
Breeding software tools: Two new tools were developed in FY23; 1) a machine learning and computer vision-based tool which can automatically detect two cell shapes in young cotton fibers using microscope images, and 2) an R/Shiny application designed to streamline the downstream analysis of cotton genetic data also enabling automatic preparation to upload raw data to the CottonGen database. These are valuable tools to be integrated into modern breeding practices.
Breeding Insight OnRamp (BI OnRamp): This program has continued to support four commodities: citrus, sugarcane, soybean, and cotton. Sugarcane work has become long-term with the addition of specific funds into the in-house project starting in FY2023; which has led to the development of a cross USDA initiative, the Sugarcane Integrated Breeding System. For all commodities supported by BI OnRamp, support continues to expand to prepare for utilizing database applications for field analyses, archiving historical data, developing better trait description indexes, and initiating development of advanced genotyping capabilities.
Spearheading UTA/USDA summer internship: Twenty-five projects using advanced statistics, visual recognition, and/or advanced computer programming were developed across the Southeast Area with students and university mentors at the University of Texas Arlington (a Hispanic serving institute). Under this program, undergraduate and graduate students are exposed to agriculture problems and then use their computational abilities to help ARS scientists to address research problems where such computational skillsets are not available in-house. The program represents a unique way to enhance ARS research while also enhancing the education of students.
Vespa genomes: Vespa mandarinia, or the northern giant hornet, is a large species of wasp that preys on honeybees across Asia. Hornets were found in the Pacific Northwest in 2019, causing concern that the hornets may be an invasive species originating from eastern Asia. We sequenced and assembled the genomes of individuals from a nest found in British Columbia, Canada and from one found in Washington state. The genomes were larger than expected due to the amount of simple repetitive DNA. We used bioinformatic techniques to compare the genomes with those of several closely related species and submitted this work for publication and is currently in review at Genome Biology. This work was also presented at the Entomological Society of America’s 2024 conference.
Camelina genome: Camelina sativa is an oilseed crop that is part of the mustard family. Camelina sativa has both a non-acclimated and cold-acclimated variety that are grown. We assisted the Weed and Insect Biology Research Unit with assembly of genomes of representatives from both varieties to find genetic differences and test the function of genes present in these lineages.
Honeybee pangenome: Apis mellifera, or the European Honeybees, are an important part of US agriculture for their pollinator services as well as the production of honey but have been declining in numbers due to several factors. Understanding the genetic health of honeybees is an important component to protecting them from further decline. We assisted the Honeybee Breeding, Genetics, and Physiology Research Unit with construction of a pangenome, or a comparison across the genomes that represent all the major honeybee breeding lines used in United States agriculture. We sequenced and assembled the genomes from each breeding line, then used a common reference honeybee to reconstruct the 16 honeybee chromosomes. This year we reconstructed the genome sequence of the mitochondria from each honeybee line to investigate any genetic differences.
Developing polyploid genetic tools and novel germplasm resources in peanut (Arachis hypogea): Polyploids possess more than two copies of each chromosome. Allopolyploidy results in organisms containing genomes from merging two or more different progenitor species, referred to as sub-genomes. The presence of two or more sub-genomes complicates genetic analysis. We are attempting to address these issues on three main fronts: 1) Eighteen divergent peanut cultivars have been inter-crossed over multiple generations to form a MAGIC (Multiparent Advanced Generation Intercross) peanut population with several desirable traits from founder lines. This project aims to understand the genetics of these traits. Long-read sequencing was used to assemble the genomes of all founder parents. Genomes were combined into a pangenomic graph that helps us account for much of the sequence variation in cultivated peanut. This method is further allowing us characterize nearly all this MAGIC variation with high fidelity. Currently, all progeny have one year of trait data and have been sequenced at low-coverage. 2) Four experimentally derived tetraploids were obtained, and their genomes were sequenced using PacBio long-read sequencing. The A and B sub genomes of four synthetic tetraploid samples were aligned to their respective progenitor A and B diploid genomes. With the sub genomic alignments, we are currently developing a pipeline to identify gene conversion regions between the sub genomes. Such identification is critical for designing modern breeding assays. 3) Collaborators have identified a major gene for rust resistance from a wild progenitor, Arachis magna. We generated a reference-grade genome for this species and performed a sequence alignment between A. ipaensis, A. tifrunner B sub genome, and A. magna to detect structural variation present between these genomes in the identified region. Also, we are currently performing an analysis to identify candidate genes present within the QTL region by annotating the A. magna genome.
Genomic-sequence selection in soybean: Two genomes Glycine species, G. max and G. soja, were assembled and annotated using long-read sequencing. A pangenome alignment of multiple G. max and G. soja reference genomes was constructed with the ultimate goal of identifying structural variants that may be associated with yield genes selected during soybean domestication. The next steps are to test for the association of these mutations with yield traits in a population derived from a cross of the G. max and G. soja parents. The data are in hand and analysis is proceeding.
Agrivoltic genetics: Agrivoltaics is an emerging field that looks at how to best use land for dual production of energy and food. In summer 2023 we started a field research project that tested tomato and lettuce growth under shade provided by semi-transparent solar panels. The light captured by the solar panels is used to power cooling fans in the “high tunnel” controlled environments, which could provide a cost-saving sustainability boost to growers. We harvested almost 500 tomato plants (12 varieties) and 350 lettuce plants (6 varieties). We discovered differences in the amount of yield from each crop variety, predicted the effect of the panels in future growing seasons, and uncovered genetic differences that might be involved in yield maintenance under shade. In Spring 2024 we began our second tomato growing season, repeating the 2023 experiment with some modifications, and we will begin harvesting soon.
Accomplishments
1. Hydrangea Flowering Genes Identified with new high-quality genomes. The first complete high-quality genome sequences for hydrangea were produced by ARS researchers in Stoneville, Mississippi, utilizing two important cultivars. Researchers across the southeast area used the genomes to identify the genes responsible for commercially important flowering traits to growers and the nursery industry, including flower-head shape and double flowering. The genomes also provided the basis for determining a more accurate understanding of the relationship of the flowering plants in the Asterid family which will help researchers as they use shared relationships to understand important flowering traits in the future.
2. Biologically correct genomes through new informatics pipeline. ARS researchers in Stoneville, Mississippi, developed a new computational pipeline that can generate highly accurate genomes for hybrid or heterozygous plants. This was accomplished by utilizing a process deemed trio-binning, that is using a combination of two parents and a resulting offspring, or hybrid. This was the first time this process has been completed in a cross between individuals that might normally comprise a plant breeding program; in this case it was done using pepper. This new informatics pipeline serves as a valuable guide for others wishing to apply trio-binning in plants to get high-quality results with at a lower cost.
3. Identified the sites on peanut proteins that are masked after oral immunotherapy treatment. ARS researchers in Stoneville, Mississippi, identified the parts of peanut proteins that are masked by desensitization antibodies when a patient undergoes oral immunotherapy for a peanut allergy. Oral immunotherapy slowly increases exposure to an allergen to prevent future allergic reactions. Identifying these regions is the first step in creating an accurate lab test for peanut allergy and provides insight to how desensitization works that may be applied to other food allergens.
4. Tools for screening microbial consortia. Microbiologists often need to screen microbes or chemicals for their ability to suppress a pathogen. ARS researchers in Stoneville, Mississippi, have developed and published software, protocols and a peer reviewed journal article detailing a high-throughput screening method based on fluorescent “most probable number” estimation methods. This tool will enhance our ability to find disease suppressing microbes and chemicals.
5. Agile Genetic methods for gene discovery in the pangenomic era. Gene discovery reveals new biology, expands the utility of marker-assisted selection, and enables CRISPR-mediated mutagenesis. Still, such discoveries can require more than a decade. To speed the process, ARS researachers in Stoneville, Mississippi, developed a general workflow for experimental population analysis involving parental genome sequencing and iterative gene refinement that only requires a single field experiment. We support the feasibility of these methods with simulation and numerous proof-of-principle experiments.
6. Adding a phosphorus-releasing enzyme to the feed of hybrid tilapia does not alter their gut microbiome. Traditionally fish feed contains inorganic phosphorus as a nutrient, but this increases costs and negatively effects water quality. Many of the plant ingredients in fish feed naturally have phosphorus as phytic acid, a form that is not biologically available. By adding an enzyme (phytase) to the feed, phosphorus can be provided in a cheaper and more environmentally sustainable way. Prior research had verified that phytase provided the needed phosphorus to the hybrid tilapia. This study by ARS researchers in Stoneville, Mississippi, verified that providing phosphorus with the phytase enzyme did not disrupt the gut microbiome of the fish. This work paves the way for developing lower cost, environmentally friendly fish feeds.
Review Publications
Mustafa, R., Hamza, M., Ur Rehman, A., Kamal, H., Nouman Tahir, M., Mansoor, S., Scheffler, B.E., Briddon, R.W., Amin, I. 2022. Asymptomatic Populus alba: a tree serving as a reservoir of begomoviruses and associated satellites. Virus Research. https://doi.org/10.1007/s13313-022-00886-5.
Gao, G., Waldbieser, G.C., Ramey, Y.C., Zaho, D., Pietrak, M.R., Stannard, J.A., Buchman, J.T., Scheffler, B.E., Peterson, B.C., Palti, Y., Rexroad III, C.E., Long, R., Burr, G.S., Milligan, M.T. 2023. The generation of the first chromosome-level de-novo genome assembly and the development and validation of a 50K SNP array for the St. John River aquaculture strain of North American Atlantic salmon. G3, Genes/Genomes/Genetics. jkad138. https://doi.org/10.1093/g3journal/jkad138.
Satterlee, T.R., Williams, F.N., Nadal, M., Glenn, A.E., Lofton, L., Duke, M.V., Scheffler, B.E., Gold, S.E. 2022. Transcriptomic response of Fusarium verticillioides to variably inhibitory environmental isolates of Streptomyces. Frontiers in Fungal Biology. https://doi.org/10.3389/ffunb.2022.894590.
Zirkel, J., Hulse-Kemp, A.M., Storm, A.R. 2023. Gossypium hirsutum gene of unknown function, Gohir.A02G039501.1, encodes a potential DNA-binding ALOG protein involved in gene regulation. microPublication Biology. https://doi.org/10.17912/micropub.biology.000670.
Brocker, D., Cutlan, M., Hulse-Kemp, A.M., Storm, A.R., Stoeckman, A.K. 2024. Gossypium hirsutum gene Gohir.A02G161000.1 encodes a potential Root UVB Sensitive Protein with a putative protein-protein interaction interface. microPublication Biology. https://doi.org/10.17912/micropub.biology.000869.
Hill, T., Cassibba, V., Joukhadar, I., Tonnessen, B., Havlik, C., Ortega, F., Sripolcharoen, S., Visser, B.J., Stoffel, K., Thammapichai, P., Garcia-Llanos, A., Chen, S., Hulse-Kemp, A.M., Walker, S., Van Deynze, A. 2023. Genetics of destemming in pepper. Frontiers in Genetics. https://doi.org/doi:10.3389/fgene.2023.1114832.
Osagiede, A., Osborne, A.N., Hulse-Kemp, A.M., Storm, A.R., Stoeckman, A.K. 2023. Gossypium hirsutum gene of unknown function Gohir.A03G0737001 encodes a potential chaperone-like protein of protochlorophyllide oxidoreductase (CPP1). microPublication Biology. https://doi.org/10.17912/micropub.biology.000867.
Wu, X., Simpson, S.A., Youngblood, R., Scheffler, B.E., Alexander, L.W., Hulse-Kemp, A.M. 2023. Whole genome sequencing reveals genes implicated with important flower traits in bigleaf hydrangea (Hydrangea macrophylla). Horticulture Research. https://doi.org/10.1093/hr/uhad217.
Chamberlin, K.D., Bennett, R., Baldessari, J., De La Barrera, G., Cordes, G., Grandon, N.G., Mamani, E., Rodriguez, A.V., Morichetti, S., Holbrook, C.C., Ozias-Akins, P., Chu, Y., Tallury, S.P., Clevenger, J.P., Korani, W., Scheffler, B.E., Youngblood, R., Simpson, S.A. 2024. Discovery of a resistance gene cluster associated with smut resistance in peanut. Peanut Science. 51(1):59-65. https://doi.org/10.3146/0095-3679-51-PS23-6.
Delorean, E.E., Youngblood, R.C., Simpson, S.A., Schoonmaker, A.N., Scheffler, B.E., Rutter, W.B., Hulse-Kemp, A.M. 2023. Representing true plant genomes: haplotype-resolved hybrid pepper genome through trio binning. Frontiers in Plant Science. https://doi.org/10.3389/fpls.2023.1184112.
Islam, M.S., Corak, K., Mccord, P.H., Hulse-Kemp, A.M., Lipka, A.E. 2023. A first look at the ability to use genomic prediction for improving ratooning ability of sugarcane. Frontiers in Plant Science. 14. https://doi.org/10.3389/fpls.2023.1205999.
Graham, B.P., Park, J., Billings, G.T., Hulse-Kemp, A.M., Haigler, C.H., Lobaton, E. 2022. Efficient imaging and computer vision detection of two cells in young cotton fibers. Applications in Plant Sciences. https://doi.org/10.1002/aps3.11503.
Ray, C.L., Abernathy, J.W., Green, B.W., Rivers, A.R., Schrader, K., Rawles, S.D., McEntire, M.E., Lange, M.D., Webster, C.D. 2024. Effect of dietary phytase on water and fecal prokaryotic and eukaryotic microbiomes in a hybrid tilapia (Oreochromis aureus x O. niloticus) mixotrophic biofloc production system. Aquaculture. 581. Article 740433. https://doi.org/10.1016/j.aquaculture.2023.740433.
Franco Melendez, K.P., Schuster, L., Donahey, M.C., Kairalla, E., Jansen, M.A., Reisch, C., Rivers, A.R. 2024. MicroMPN: methods and software for high-throughput screening of microbe suppression in mixed populations. Microbiology Spectrum. https://doi.org/10.1128/spectrum.03578-23.
Vaughn, J.N., Korani, W., Clevenger, J., Ozias-Akins, P. 2024. agile genetics: single gene resolution without the fuss. Bioessays. https://doi.org/10.1002/bies.202300206.
Pan, Y., Todd, J.R., Lomax, L.E., White Jr, P.M., Simpson, S.A., Scheffler, B.E. 2023. Molecular Dissection of the 5S Ribosomal RNA-Intergenic Transcribed Spacers in Saccharum spp. and Tripidium spp.. Agronomy Journal. https://doi.org/10.3390/agronomy13112728.
Page, C.A., Simpson, S.A., Perez Diaz, I.M., Rivers, A.R. 2024. Annotated whole-genome sequences of fermentative and spoilage associated bacilli and proteobacteria autochthonous to commercial cucumber fermentation. Microbiology Resource Announcements. 13. Article e00926-23. https://doi.org/10.1128/mra.00926-23.
Rambo, I., Kronfel, C.M., Rivers, A.R., Swientoniewski, L., Mcbride, J.K., Cheng, H., Simon, R.J., Ryan, R., Tilles, S., Nesbit, J.B., Kulis, M.D., Hurlburt, B.K., Maleki, S.J. 2023. IgE and IgG4 epitopes of the peanut allergens shift following oral immunotherapy. Frontiers in Allergy. 4: Article 1279290. https://doi.org/10.3389/falgy.2023.1279290.
Smith, E.R., Caulley, L.R., Hulse-Kemp, A.M., Storm, A.R., Stoeckman, A.K. 2023. Gossypium hirsutum gene of unknown function Gohir.A03G007700.1 encodes a potential VAN3-binding protein with a phosphoinositide-binding site. microPublication Biology. https://doi.org/10.22002/rf8g5-hm387.