Skip to main content
ARS Home » Midwest Area » Ames, Iowa » National Animal Disease Center » Virus and Prion Research » Research » Publications at this Location » Publication #409460

Research Project: Intervention Strategies to Control Endemic and New and Emerging Influenza A Virus Infections in Swine

Location: Virus and Prion Research

Title: Asymmetric cluster-based measures for comparative phylogenetics

item WAGLE, SANKET - Iowa State University
item MARKIN, ALEXEY - Iowa State University
item GORECKI, PAWEL - University Of Warsaw
item Anderson, Tavis
item EULENSTEIN, OLIVER - Iowa State University

Submitted to: Journal of Computational Biology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/10/2024
Publication Date: 4/22/2024
Citation: Wagle, S., Markin, A., Gorecki, P., Anderson, T.K., Eulenstein, O. 2024. Asymmetric cluster-based measures for comparative phylogenetics. Journal of Computational Biology.

Interpretive Summary: The identification of genetically novel influenza A viruses (IAV) that contain genes derived from human-, swine-, or avian-origin IAV is critical for controlling infection in swine. These novel viruses may be undergoing rapid changes in genetic diversity that reduce the efficacy of vaccine control methods and may also pose a greater risk to humans for zoonotic infection. In this study, we developed algorithms that measure the distance between two evolutionary trees. The software can identify the differences between the two trees as a cost score and normalizes the cost based on the size of genetic clusters within the different gene trees. The cost score can be used to subsequently merge individual gene trees together into a larger phylogenetic network describing how reassortment has impacted the evolution of the virus. The proof of the algorithm was validated using simulated data and demonstrated improved performance against other state-of-the-art comparison metrics. The development of this algorithm provides computational support for USDA IAV in swine surveillance. It can objectively identify when the genetic components of a virus are derived from different evolutionary origins and can identify when specific clusters of genes are more frequently paired together. These data may then be applied to identify genetically novel swine IAV strains for characterization, for use in vaccine development, and it may be used to search for genetic markers associated with the transmission and persistence of IAV in swine populations.

Technical Abstract: Tree comparison costs are sophisticated tools used to compare the results of different phylogenetic hypotheses and reconstruction methods and to evaluate the robustness of a tree to data perturbations. The Robinson-Foulds distance is a widely used measure for comparing the topologies of two trees, but it is highly sensitive to tree error. Consequently, tree differences may be over-estimated, leading to incorrect inference. An approach to overcome this shortcoming is the Cluster Affinity distance, which is a refinement of the Robinson-Foulds distance. These distances are symmetric and thus designed to compare the same type of trees. However, it is common to compare different types of trees, such as gene trees compared with species trees, the integration of different datasets into a supertree, or applying tree measures to infer phylogenetic networks: these comparisons are inherently asymmetric. Here, we introduce the asymmetric Cluster Affinity cost, a relaxation of the original Affinity cost to compare heterogeneous trees. We also introduce a biologically interpretable cost measure related to the CA cost, the Cluster Support (CS) cost, that normalizes cost by cluster size across gene trees. We demonstrate that the characteristics of these costs are similar to the symmetric Cluster Affinity distance. Further, for the asymmetric affinity cost we describe efficient algorithms, derive the exact diameters, and use these to standardize the cost to be applicable in practice. These tree measures provide objective, fine-scale, and biologically interpretable values that can assess differences and similarities between phylogenetic trees.