|Van Kessel, Jo Ann|
Submitted to: Genome Biology and Evolution
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/11/2013
Publication Date: 10/24/2013
Citation: Timme, R., Pettengrll, J., Allard, M., Strain, E., Barrangou, R., Wehnes, C., Van Kessel, J.S., Karns, J.S., Musser, S., Brown, E. 2013. Phylogenetic Diversity of the Enteric Pathogen Salmonella enterica subsp. enterica Inferred from Genome-Wide Reference-Free SNP Characters. Genome Biology and Evolution. 5:2109-2123. Interpretive Summary: Salmonella enterica is a major cause of food-borne illness in the US, leading to more deaths than any other food-related pathogen. There are more than 2500 different serovars or sub-types of Salmonella enterica. This extreme diversity has historically made it difficult for investigators to follow the movement of pathogens when there are food or water related outbreaks. Some DNA-based technologies have improved traceability beyond simple serotype identification but these methods cannot discriminate well enough to support the needs of outbreak investigators. Huge advances in sequencing technologies have been made in recent years; bacterial genome sequencing can now be done relatively fast and at affordable costs. Here we describe the comparison of 156 bacterial genome sequences that represent 78 Salmonella serovars. By using powerful computational programs, single nucleotide polymorphisms (SNPs) were identified in the Salmonella genomes and these differences between the genomes were used to create a phylogenetic tree that describes the evolutionary relationships between the individual strains of Salmonella. The isolates were grouped into two large clusters with many branches within each cluster. This helps us to further understand the evolutionary path for strain evolution and the relationships between serotypes. The work is our first step in building a high-resolution reference database and tree-based framework for tracking pathogens through our national and global food supply. This research will be of interest to other scientists and to regulatory agencies.
Technical Abstract: Salmonella enterica is a major cause of food-borne illness in the US, leading to more deaths than any other food-related pathogen. This is an extremely diverse bacterial species consisting of six subspecies and over 2500 named serovars. Examining the evolutionary history within Salmonella with techniques like PFGE and MSLT has yielded little resolution due to the coarse nature of these technologies. With the influx of next generation DNA sequencing technology (NGS) we are now able to extract variation across the entire genome to find the rare microevolutionary changes useful for tracking evolutionary history. Toward this effort, we present the first large scale Salmonella enterica subsp. enterica phylogeny inferred from 156 genomes across 78 serovars. The phylogenetic hypothesis presented here was based on a reference-free k-mer approach of gathering SNPs. A maximum likelihood analysis of the ~20K SNP matrix recovered strong bootstrap support for two large clades as well as many terminal groups. We also test various hypotheses about the accuracy of current taxonomic alignments, character evolution (i.e., O and H antigens and variation in the CRISPR region), and elucidate historical fluctuations in the rate of diversification. In addition to furthering our understanding of the evolutionary history of Salmonella, this phylogeny is our first step in building a high-resolution reference database and tree-based framework for tracking pathogens through our national and global food supply.