Skip to main content
ARS Home » Northeast Area » Wyndmoor, Pennsylvania » Eastern Regional Research Center » Molecular Characterization of Foodborne Pathogens Research » Research » Publications at this Location » Publication #290292

Title: A simulated metagenomic approach for bacterial serotyping using shotgun genome sequences coupled with O-Antigen gene cluster analysis

item Yan, Xianghe
item Chen, Chinyi
item HU, JIN - Franklin And Marshall College
item Fratamico, Pina

Submitted to: Meeting Abstract
Publication Type: Abstract Only
Publication Acceptance Date: 5/18/2013
Publication Date: 5/18/2013
Citation: Yan, X., Chen, C., Hu, J., Fratamico, P.M. 2013. A simulated metagenomic approach for bacterial serotyping using shotgun genome sequences coupled with O-Antigen gene cluster analysis. Meeting Abstract. MA.

Interpretive Summary:

Technical Abstract: Background: Accurate determination of food-borne pathogen serotype and genotype information is important for disease surveillance and outbreak source tracking. E. coli serotype O157:H7 and non-O157 of Shiga toxin-producing E. coli (STEC) serogroups, including O26, O45, O103, O111, O121, O145 (top six) and others, are important food-borne pathogens that cause similar illnesses. The classification of STEC is traditionally based on phenotypic analyses and/or PCR-based molecular typing targeting specific biomarkers such as O-antigen and stx genes. These procedures are often time-consuming and inaccurate. Recently, advances in low-cost next-generation sequencing (NGS) technologies have provided a potential for accurate and high-throughput detection and identification of bacterial pathogens. We sought to use the power of NGS and computational technologies to facilitate and streamline the processing and analysis of NGS data for rapid E. coli genotyping and detection based on computational O-antigen gene cluster and virulence gene analysis. Materials and Methods: A total of 20 whole genome raw sequences including STEC O157:H7, the top 6 non-O157, and non-target organisms were collected from the NCBI SRA site ( The pooled testing DNA datasets were divided into three different sub-datasets. Each sub-dataset was pooled from 15 different whole genome raw sequences with a known combination of serogroups and serotype richness to simulate metagenomic data. These sub-datasets were then compared to an in-house E. coli O-antigen and stx gene database covering all publically available O-antigen cluster sequences and various stx gene subtypes. Sequences were quality trimmed and mapped to the above database using CLC Genomic Workbench 5.1 ( Results and Conclusion: The pooled raw sequences were accurately reclassified into appropriate serogroups. The coverage of sequence reads allowed a numerical readout of the O-antigen and stx genes, enabling rapid detection of pathogenic STEC and estimate of the richness of certain serotypes in the simulated metagenomic datasets. This approach shows potential for molecular serotyping, genotyping, and detection of emerging pathogenic strains potentially in food or environmental samples.