Submitted to: Systematic Biology
Publication Type: Peer reviewed journal
Publication Acceptance Date: 4/20/2006
Publication Date: 4/1/2007
Citation: Reeves, P.A., Richards, C.M. 2007. Distinguishing terminal monophyletic groups from hybrid taxa: performance of cladistic and phenetic procedures. Systematic Biology 56:302-320. Interpretive Summary: The accurate identification of distinct historical lineages is important for delimiting species, understanding genetic relationships among populations, and prioritizing groups of organisms for in situ and ex situ conservation programs. This study used computer simulations to evaluate the relative performance of some of the most common methods for identifying lineages, collectively termed “cladistic phylogeny reconstruction methods.” The study shows that these methods are biased, promoting the incorrect conclusion that a group of organisms is a distinct historical lineage, when in fact it is an amalgam of two or more lineages (a “hybrid taxon” of reticulate ancestry). This study concludes that it may be very difficult to distinguish hybrid taxa from distinct historical lineages using common methods for phylogeny reconstruction. A new application of a nonparametric clustering procedure was shown to perform better than cladistic methods for distinguishing real historical lineages from hybrid taxa. The latter statistical procedure may be particularly useful for discovering NPGS seed accessions likely to contain novel alleles relative to other accessions from the same species. The procedure might also be used to identify accessions that have been contaminated with genes from other accessions.
Technical Abstract: Hybridization between taxa is a well documented, natural phenomenon that is common at low taxonomic levels in the higher plants and other groups. In spite of the obvious potential for gene flow via hybridization to cause reticulation in an evolutionary tree, analytical methods based on a strictly bifurcating model of evolution have frequently been used to reconstruct phylogenetic trees containing taxa known to hybridize in nature. In order to understand the consequences of such analyses, we evaluated the relative performance of seven analytical approaches for distinguishing between hybrid taxa and terminal monophyletic groups using multi locus data sets simulated under four topologically distinct scenarios of gene flow in unrooted 5 taxon trees. Using the loss of monophyly of simulated hybrid taxa in trees from data sets sampled along a continuous time course as the indicator of hybrid history, we found that parsimony significantly outperformed neighbor joining, and, not surprisingly, that the use of bootstrap support values improved sensitivity over approaches that relied on the topology of the best tree alone. However, all bifurcating tree based methods performed poorly. Based on our model, we estimate that many thousands of gene flow events may be required in natural systems before hybrid taxa will be reliably detected using common methods of phylogeny reconstruction. Furthermore, highly supported, erroneous topologies were observed during the early stages of simulated introgressive hybridization. Therefore, we conclude that the use of standard bifurcating tree based methods to identify terminal monophyletic groups for the purposes of defining or delimiting phylogenetic species, or for prioritizing populations for conservation purposes, is difficult to justify when gene flow between sampled taxa is possible. As an alternative, we present a novel application of an existing nonparametric clustering procedure that, when used against a "density landscape" derived from principal coordinate data, shows superior performance to the tree based procedures tested.