Skip to main content
ARS Home » Midwest Area » Ames, Iowa » National Animal Disease Center » Virus and Prion Research » Research » Publications at this Location » Publication #396724

Research Project: Intervention Strategies to Control Endemic and New and Emerging Influenza A Virus Infections in Swine

Location: Virus and Prion Research

Title: PARNAS: Objectively selecting the most representative taxa on a phylogeny

item MARKIN, ALEXEY - Oak Ridge Institute For Science And Education (ORISE)
item WAGLE, SANKET - Iowa State University
item GROVER, SIDDHANT - Iowa State University
item Baker, Amy
item EULENSTEIN, OLIVER - Iowa State University
item Anderson, Tavis

Submitted to: Systematic Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/28/2023
Publication Date: 5/19/2023
Citation: Markin, A., Wagle, S., Grover, S., Baker, A.L., Eulenstein, O., Anderson, T.K. 2023. PARNAS: Objectively selecting the most representative taxa on a phylogeny. Systematic Biology. esyad028.

Interpretive Summary: Diagnostic laboratories routinely generate large genetic sequence datasets with tens or hundreds of thousands of pathogen genes and strains. This represents a great opportunity for computational studies but introduces a significant challenge for virologists who are forced to select a few representative virus strains from the vast array of available diversity to perform phenotypic characterization. We introduce a novel computational tool, PARNAS, for fast and objective selection of most representative strains from a given phylogeny. We demonstrated that PARNAS is faster and more versatile than the existing alternatives. The algorithm was able to automatically select a minimal set of 6 representative hemagglutinin genes from the USDA influenza A virus in swine surveillance system, and demonstrated how these genes remained genetically representative for approximately two years. The development of PARNAS provides computational support for pathogen genomic surveillance as it is able to objectively identify and select representative virus strains and strains that have diverged from those representatives. These data can be used to design vaccines that better reflect the genetic diversity of influenza A and other viruses circulating in the field, and may help reduce the risk of interspecies transmission by identifying genetically novel viruses.

Technical Abstract: The use of next-generation sequencing technology has enabled phylogenetic studies with hundreds of thousands of taxa. Such large-scale phylogenies have become a critical component in genomic epidemiology in pathogens such as SARS-CoV-2 and influenza A virus. However, detailed phenotypic characterization of pathogens or generating a computationally tractable dataset for detailed phylogenetic analyses requires bias free subsampling of taxa. To address this need, we propose PARNAS, an objective and flexible algorithm to sample and select taxa that best represent observed diversity by solving a generalized k-medoids problem on a phylogenetic tree. PARNAS solves this problem efficiently and exactly by novel optimizations and adapting algorithms from operations research. For more nuanced selections, taxa can be weighed with metadata or genetic sequence parameters, and the pool of potential representatives can be user-constrained. Motivated by influenza A virus genomic surveillance and vaccine design, PARNAS can be applied to identify representative taxa that optimally cover the diversity in a phylogeny within a specified distance radius. We demonstrated that PARNAS is more efficient and flexible than current approaches, and applied it to influenza A virus in swine problem, showing that only 4 to 6 strains objectively selected every two years are sufficient to cover 80% of diversity circulating in US swine. We suggest that this method, through the objective selection of representatives in a phylogeny, provides criteria for rational multivalent vaccine design and for quantifying diversity. PARNAS is available at