|Meinersmann, Richard - Rick|
Submitted to: Microbial Informatics and Experimentation
Publication Type: Peer reviewed journal
Publication Acceptance Date: 7/20/2012
Publication Date: 8/28/2012
Citation: Snipen, L., Wassenaar, T., Alterman, E., Olson, J., Kathariou, S., Lagesen, K., Knochel, S., Takamiya, M., Ussery, D., Meinersmann, R.J. 2012. Analysis of evolutionary patterns of genes in campylobacter jejuni and C. coli. Microbial Informatics and Experimentation. 2(1):8. doi: 10.1186/2042-5783-2-8. Interpretive Summary: Campylobacter is the genus of bacteria that is responsible for the greatest number of human diarheal disease. The organism is associated with poultry and other food animals. Population genetics has been applied to the study the migration of Campylobacter and its association with hosts and other members of its genus. Up to now, population genetics has relied on typing of a group of genes from the organism that were selected with the intent that they would be representative of the entire genome. However, the representation has been based on assumptions that can only be tested by analysis of the entire genetic makeup of a group of the organisms. Routine total sequencing of Campylobacter genomes is still not practical, but there are now enough such sequences (25) available to test the assumptions. We had to start by making a standardized assignment of all the genes in each of the available genomes to be sure that uniform assessments were made. We then had to invent a method to simultaneously compare the variation in all of the 1029 genes we identified as being contained in all the genomes. The method involved measuring the differences in each of these genes for each of the 25 genomes to create a data file that was then subjected to specialized statistical analysis for determining clusters. We were able to define clusters that correlated with specific evolutionary influences, such as selective pressure for frequent changes (‘mutations’) or participation in cross-bacterial exchange events (‘recombination’ also known as ‘lateral gene transfer’). These are factors that affect interpretation of population genetic data for analysis of migration, so knowing which genes have these influences will be useful in future interpretations.
Technical Abstract: Background: In order to investigate the population genetics structure of thermophilic Campylobacter spp., we extracted a set of 1029 core gene families (CGF) from 25 sequenced genomes of C. jejuni, C. coli and C. lari. Based on these CGFs we employed different approaches to reveal the evolutionary histories suggested by the various CGFs. One approach was based on topological distance between maximum likelihood phylogenetic trees, another based on partitioning of genomes by evolutionary distances, and a last one based on principal component analysis (PCA) of normalized pair-wise evolutionary distances. Results: Each approach resulted in a different approximation to the underlying evolutionary landscape, and in each case every CGF was represented as a vector in a 'phylogenetic' space. We also computed and collected extra categorical features for the same genes. These were used to verify whether regions of the vector spaces were enriched with genes sharing a particular feature. Conclusions: We found that the PCA of the normalized pair-wise evolutionary distances resulted in the most interesting representation of the CGFs. In this space we identified two distinct regions, one highly enriched with genes under positive selection and one of genes with high recombination rate.