Skip to main content
ARS Home » Midwest Area » Ames, Iowa » National Animal Disease Center » Virus and Prion Research » Research » Publications at this Location » Publication #400504

Research Project: Intervention Strategies to Control Endemic and New and Emerging Influenza A Virus Infections in Swine

Location: Virus and Prion Research

Title: Phylogenetic diversity statistics for all clades in a phylogeny

item GROVER, SIDDHANT - Iowa State University
item MARKIN, ALEXEY - Oak Ridge Institute For Science And Education (ORISE)
item Anderson, Tavis
item EULENSTEIN, OLIVER - Iowa State University

Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 4/10/2023
Publication Date: 6/30/2023
Citation: Grover, S., Markin, A., Anderson, T.K., Eulenstein, O. 2023. Phylogenetic diversity statistics for all clades in a phylogeny. Bioinformatics. 39(1):i177-i184.

Interpretive Summary: Measuring diversity of biological organisms is one of the most fundamental problems in ecology and evolutionary biology. Understanding the diversity of organisms is crucial in conservation efforts as well as in control of pathogens. In this work we develop new computational techniques to compute descriptive statistics of organisms' diversity. Our techniques are based on a popular notion of 'phylogenetic diversity' and use intricate algorithms to compute the descriptive statistics. Our algorithms compute some essential diversity statistics significantly faster than the previously suggested methods. We present a tool for in-depth study of diversity of organisms given their evolutionary history. By contemplating how diversity changes during the course of evolution, we identify the hotspots of diversity: i.e., locales and time-periods where organisms undergo rapid diversification.

Technical Abstract: A quantitative measure of phylogenetic diversity, PD, has been used to address problems in conservation biology, microbial ecology, and evolutionary biology. PD has been defined as the minimum total length of the branches in a phylogeny required to cover a specified set of taxa on the phylogeny. A general goal in the application of PD has been to identify taxa that maximize PD on a given phylogeny, and this has been mirrored in the development of algorithms that can solve the problem. Other descriptive statistics, such as the minimum PD, average PD, and standard deviation of PD, provide valuable and often needed insight into the distribution of PD across a phylogeny but there is limited work on computing these statistics. We introduce efficient and exact algorithms for computing PD and the associated descriptive statistics for an entire phylogeny. Our algorithms also compute PD statistics for every clade in a phylogeny, enabling direct comparisons of PD between clades. We conducted a simulation study to test the scalability of our algorithms and demonstrate that PD statistics can be efficiently computed to analyze large phylogenies with application in ecology and evolutionary biology.