Submitted to: American Society for Microbiology General Meeting
Publication Type: Abstract only
Publication Acceptance Date: 2/16/2014
Publication Date: 5/17/2014
Citation: Desai, P.T., Porwollik, S., Long, F., Cheng, P., Weinstock, G., Fields, P., Weimer, B., Guiney, D., Gal-Mor, O., Rabsch, W., Frye, J.G., Guard, J.Y., Mcclelland, M. 2014. Pangenome and taxonomic analysis of Salmonella enterica subspecies enterica. American Society for Microbiology General Meeting. May 17-20, 2014. Boston, Massachusetts. Interpretive Summary:
Technical Abstract: Salmonella enterica subspecies enterica (S. enterica ssp. I) contains almost all the major pathogens in this genus. We sequenced 354 new S. enterica ssp. I genomes using paired end 100 base reads to ~80-fold coverage. These genomes were chosen to maximize genetic diversity, representing at least 100 different serovars and multiple distinct PFGE patterns within most of these serovars. Among our chosen isolates were 119 strains with known antibiotic resistances, encompassing at least 80 different resistance patterns. All 354 new sequences were analyzed together with 350 publicly available Salmonella genomes from all five other S. enterica subspecies and S. bongori. Together, this collection encompassed 133 serovars. Using a maximum likelihood taxonomic tree of SNPs in regions shared by all genomes, and a threshold of 0.008 substitutions per site, we identified at least ten deep rooting taxonomic groups within S. enterica ssp. I. Large scale genome chimerism was observed within and between almost all heavily sampled serovars. Admixture analysis using BAPS (Bayesian Analysis of Population Structure) suggested that large scale genome chimerism is one of the principal mechanisms which give rise to deep clades within taxonomic groups. Estimates for the time of divergence from the most recent common ancestor (MRCA) for strains within the same serovars varied up to tens of thousands of years when a divergence rate of 3.5 x 10-9 substitutions per site per year was assumed but was between 35 to 2500 years when using Bayesian tip dating methods. Genomes were annotated using RAST. After removing open reading frame singletons and sequencing artifacts, the pangenome of genes that occurred in at least two S. enterica strains contained approximately 19,000 gene families based on OrthoMCL analysis, of which about 6000 families were only found in ssp. I. Approximately 1900 of these ssp. I-specific “cloud” gene families had homologs in individual strains in other Enterobacteriaceae genera that were more closely related than were pairs of control core genes found in all Salmonella genomes, indicating lateral transfer. The public resource of taxonomically diverse Salmonella sequences and associated metadata will be useful for epidemiology, molecular serotyping and evolutionary studies.