Skip to main content
ARS Home » Southeast Area » Raleigh, North Carolina » Plant Science Research » Research » Publications at this Location » Publication #405762

Research Project: Genetics of Disease Resistance and Food Quality Traits in Corn

Location: Plant Science Research

Title: Qmatey: An automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes

item ADAMS, ALISON - University Of Tennessee
item KRISTY, BRANDON - Michigan State University
item GORMAN, MYRANDA - University Of Tennessee
item Balint-Kurti, Peter
item YENCHO, G. CRAIG - North Carolina State University
item OLUKOLU, BODE - University Of Tennessee

Submitted to: Briefings in Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 9/26/2023
Publication Date: 11/1/2023
Citation: Adams, A., Kristy, B., Gorman, M., Balint Kurti, P.J., Yencho, G., Olukolu, B. 2023. Qmatey: An automated pipeline for fast exact matching-based alignment and strain-level taxonomic binning and profiling of metagenomes. Briefings in Bioinformatics. 24(6):1-11.

Interpretive Summary: Metagenomics involves the bulk sequencing of microbial communitiers that are harvested from the environment. This is followed by the bioinformation deconvolution of the sequences to identify types and quantities of the individual organisms or classes of organisms that were contained in the original sample. This manuscript describes “Qmatey”, an automated pipeline that allows for more accurate and faster identification of organisms within sampled and bulk-sequenced communities. As such Qmatey is a potentially useful tool for metagenomic analysis.

Technical Abstract: Metagenomics is a powerful tool for understanding organismal interactions; however, classification, profiling and detection of interactions at the strain level remain challenging. We present an automated pipeline, quantitative metagenomic alignment and taxonomic exact matching (Qmatey), that performs a fast exact matching-based alignment and integration of taxonomic binning and profiling. It interrogates large databases without using metagenome-assembled genomes, curated pan-genes or k-mer spectra that limit resolution. Qmatey minimizes misclassification and maintains strain level resolution by using only diagnostic reads as shown in the analysis of amplicon, quantitative reduced representation and shotgun sequencing datasets. Using Qmatey to analyze shotgun data from a synthetic community with 35% of the 26 strains at low abundance (0.01–0.06%), we revealed a remarkable 85–96% strain recall and 92–100% species recall while maintaining 100% precision. Benchmarking revealed that the highly ranked Kraken2 and KrakenUniq tools identified 2–4 more taxa (92–100% recall) than Qmatey but produced 315–1752 false positive taxa and high penalty on precision (1–8%). The speed, accuracy and precision of the Qmatey pipeline positions it as a valuable tool for broad-spectrum profiling and for uncovering biologically relevant interactions.