Location: Virus and Prion ResearchTitle: smot: a python package and CLI tool for contextual phylogenetic subsampling
|ARENDSEE, ZEBULUN - Oak Ridge Institute For Science And Education (ORISE)
Submitted to: Journal of Open Source Software
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 12/20/2022
Publication Date: 12/20/2022
Citation: Arendsee, Z.W., Baker, A.L., Anderson, T.K. 2022. smot: a python package and CLI tool for contextual phylogenetic subsampling. Journal of Open Source Software. 7(80). Article 4193. https://doi.org/10.21105/joss.04193.
Interpretive Summary: The U.S. Department of Agriculture influenza A virus (IAV) in swine surveillance system monitors the genetic diversity and evolutionary trends of thousands of IAV strains. Analysis of thousands of genetic sequences is computationally difficult, and important evolutionary trends and epidemiological linkages may be obscured. This problem may be overcome through reducing the number of sequences analyzed through downsampling, but this must be conducted so that genetic and geographic diversity is maintained to ensure host-to-host transmission is detectable and accurate evolutionary inference is conducted. We introduce a rigorous and empirically validated Python package and command line utility called "smot" (Simple Manipulation Of Trees). This package offers general functions for filtering phylogenetic trees, algorithms for classifying unlabeled tips given a subset of labeled reference tips, and subsampling algorithms that preserve reference strains and tree topology. The smot tool is an integral component in phylogenetic pipelines that sample and identify representative strains for whole genome sequencing within the USDA IAV in swine surveillance system. It also facilitates the rapid identification and visualization of interspecies transmission events. The smot tool is publicly available and through its objective quantification of spatial and temporal trends in the diversity of IAV, allows stakeholders to make informed decisions on IAV vaccine design to improve animal health.
Technical Abstract: smot (Simple Manipulation Of Trees) is a command line tool and Python package with the pragmatic goal of distilling large-scale phylogenetic data to facilitate inference and visualization. This package offers general functions for filtering phylogenetic trees, algorithms for classifying unlabeled tips given a subset of labeled reference tips, and subsampling algorithms that preserve reference strains and tree topology. The smot tool has broad application in phylogenetic analysis and we demonstrate its utility using a genomic epidemiology study of influenza A virus in swine.