Location: Produce Safety and Microbiology Research
Project Number: 2030-42000-052-015-R
Project Type: Reimbursable Cooperative Agreement
Start Date: Jan 1, 2025
End Date: Dec 31, 2026
Objective:
Objective 1: Determine the evolutionary history of REPEXH02 strains in the broader context of E. coli O157:H7 clade 2 and identify unique gene content with potential to impact strain persistence. The hypotheses for this objective are: i) REPEXH02 is a group of strains with limited diversification which recently emerged within clade 2, ii) genetic differences (gene content and/or SNPs in core genome) of REPEXH02 are accumulating at a different rate compared to other clade 2 subgroups.
Objective 2: Identify non-host reservoirs and environmental harborage sites of REPEXH02 strains utilizing environmental sample collection, phenotypic assays, and genome wide association analysis. The hypotheses for this objective are: i) unique genetic features will be identified providing focused areas for sample collection, ii) genotypic and phenotypic features will be associated with persistent phenotypes, and iii) sequencing of newly isolated strains combined with genome wide association analysis will identify specific hosts for REPEXH02.
Objective 3: develop a machine learning tool to support agricultural decision-making based on sample metrics. The hypothesis of this objective is that the water, soil, and scat sample metrics can be used to predict the likelihood of REPEXH02 presence and provide recommendations on sampling strategies.
Approach:
For comparative genomics of E. coli O157:H7 clade 2, all available E. coli O157:H7 WGS data will be downloaded from the Sequence Read Archive at NCBI (~7,800 genomes). We also have access to WGS data from all E. coli O157:H7 isolates from Michigan (2007 to present), Central California (~5,000 isolates), and the historical STEC collection at MSU (1,818 isolates). WGS data will be screened using SNP typing to identify those genomes belonging to clade 2. For existing isolates without WGS data, a SNP typing qPCR will be used to identify isolates belonging to clade 2, which will then be sequenced at the MSU Genomics Core. Raw reads will be trimmed and genome assembly will be performed using Spades v3.15.2. Prokka v1.14.5 will be used for annotation and the Roary pangenome pipeline will be used to identify core gene sequences and generate a maximum-likelihood phylogeny with RAxML to identify clades and define evolutionary relationships. hqSNPs will be used to identify differences among strains and any sub-clades that are present.
Environmental and wildlife samples will be collected from the areas in Salinas and Santa Maria where REPEXH02 was initially identified. E. coli O157:H7 will be isolated from each sample. Samples will be collected year-round, with a focus on late summer, fall, and late fall. Sample collection sites will focus on water sources near produce fields, fields that are not in use but adjacent to active fields, and riparian zones near fields. Physical and chemical metrics will be measured for each sample, including temperature, pH, ORP, salinity, dissolved O2, and turbidity for water and temperature, pH, heavy metal content, total organic carbon, nitrogen, phosphorus, and other elements for soil. Scat samples will be assigned to the animal of origin based on PCR and sequencing of DNA isolated from the scat. Samples will be processed to detect E. coli O157:H7, and isolates will be screened for clade 2 membership. Those that belong to clade 2 will be sequenced and added to the comparative genomic assessment.
Genome wide association analysis will be used to identify genomic elements specific to REPEXH02 strains that are associated with specific hosts or environments. Additional experiments to define phenotypic differences will be conducted based on the results of the analysis. For example, one notable difference within REPEXH02 is the mutation of an arsenic resistance gene, with the potential for increased arsenic resistance. Levels of heavy metals will be assessed from collected soil samples, and increased tolerance to arsenic, other heavy metals, and antibiotics will be assessed in laboratory experiments.