Location: Meat Safety and Quality
Project Number: 3040-42000-020-003-T
Project Type: Trust Fund Cooperative Agreement
Start Date: Feb 1, 2020
End Date: Mar 31, 2021
The objective of this study is to use a combination of whole genome sequencing (WGS) and complete whole genome sequencing (CWGS) to characterize short- (31 days) and long-term (22 years) genetic variation of Shiga toxin-producing Escherichia coli O157:H7 (STEC O157) in natural environments to improve interpretations of strain relatedness in outbreak investigations.
Two different sets of previously collected STEC O157 isolates (a total of 955) will be utilized: 175 isolates for the long-term objective and 780 for the short-term objective. Long term genetic evolution. Long term variation in a natural ecological environment will be evaluated using strains isolated from the USMARC feedlot over a 22-year period. The feedlot is closed to cattle outside of the Center, therefore, the overall majority of STEC O157 strains belong to one of two major PFGE types, uniquely positioning us to conduct this study on a stable clonal population of strains. Up to ten STEC O157 isolates collected per year will be sequenced using short read sequencing (WGS). A phylogenetic tree constructed from these short-read samples will be used to select 24 genetically diverse isolates for long read sequencing to produce closed genomes. Short term genetic evolution. Short term evolution changes will be evaluated through isolates previously obtained from 16 steers collected over 31 days. Four strains of STEC O157 with two different tir gene variants (tir 255 T>A T allele and tir 255 T>A A allele) were applied to the recto-anal junction (RAJ) of the cattle (n = 4). RAJ samples were obtained on ten different sampling days from day 1 until day 31, enriched, and plated on agar selective for STEC O157. Samples that were positive for STEC O157 were purified and frozen until sequencing. For this study, up to ten isolates from each animal per day that tested positive will be sequenced using short reads (780 samples). After short read analysis, six isolates will be chosen from the branches of the short-read tree for each of the four isolates for long read sequencing (CWGS) to address random rates of changes within and across strains. Bioinformatic methods. The bioinformatic pipeline for both short- and long-term objectives will be the same. Short reads will be generated on an Illumina machine (WGS) and assembled with SPAdes; phylogenetic trees will be built using MEGA10. Upon tree construction, 24 genetically diverse isolates identified from the initial tree will be selected for long read sequencing on PacBio Sequel (CWGS). PacBio data will be assembled with HGAP4 and error corrected with Pilon. Illumina reads for the same isolates sequenced on the PacBio Sequel will be mapped to their corresponding CWGS genome to look for genomic differences and mobile elements, i.e. plasmids, phages and insertion sequence unique to each strain. Parsnp will be used to identify relatedness between strains using SNPs in the conserved region. Mauve aligner will be used to identify variable regions in the genome. Recombination analysis will be performed using BRATNEXTGEN and timed phylogenies reconstructed using BEAST-MCMC.