|VELAZQUEZ-SALINAS, LAURO - Oak Ridge Institute For Science And Education (ORISE)|
|ZARATE, SELENE - Autonomous University Of San Luis Potosi|
|ESCHBAUMER, MICHAEL - Oak Ridge Institute For Science And Education (ORISE)|
|LOBO, FRANCISCO - Embprapa|
|NOVELLA, ISABEL - University Of Toledo|
Submitted to: PLOS ONE
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 7/11/2016
Publication Date: 7/25/2016
Citation: Velazquez-Salinas, L., Zarate, S., Eschbaumer, M., Lobo, F.P., Gladue, D.P., Arzt, J., Novella, I.S., Rodriguez, L.L. 2016. Selective factors associated with the evolution of codon usage in natural populations of arboviruses and their practical application to infer possible hosts for emerging viruses. PLoS One. PLoS ONE 11(7): e0159943. doi:10.1371/journal.pone.0159943.
Interpretive Summary: Arboviruses (arthropod borne viruses) have life cycles that include specific vertebrate (e.g. animals such as cattle, horses, etc) and invertebrate (e.g. insects such as mosquitoes, midges, etc) hosts. Arboviruses are completely dependent on host machinery to read their genetic code information and complete their life cycle and therefore their genetic code usage seems to reflect that of their host. In this work we compared the genetic coding used by 26 arboviruses with that of 25 vectors and mammalian hosts. We found that the specific genetic code used by some arboviruses matched closely those of the insect vector that transmits them while others matched closely the genetic code of their animal host. Furthermore, we found some specific markers that can be used to predict the potential vector or animal host of an arbovirus. These markers can help identify potential hosts and vectors for previously unknown emerging arboviruses.
Technical Abstract: Arboviruses (arthropod borne viruses) have life cycles that include both vertebrate and invertebrate hosts with substantial differences in vector and host specificity between different viruses. Most arboviruses utilize RNA for their genetic material and are completely dependent on host tRNAs for their translation, suggesting the that virus codon usage could be a target for selection. In the current study we analyzed the relative synonymous codon usage (RSCU) patterns of 26 arboviruses together with 25 vectors and hosts, including 8 vertebrates and 17 invertebrates. We used effective number of codons (ENC), hierarchical cluster analysis (HCA) and principal component analysis (PCA) to identify trends in codon usage. HCA demonstrated that the RSCU of arboviruses reflects that of their natural hosts, but not that of dead-end hosts. Of the two major components identified by PCA, the first accounted for 62.1% of the total variance, and among the 59 codons analyzed in this study, the leucine codon CTG had the highest correlation with the first principal component. Some amino acids, such as isoleucine, were also correlated with the first component. Nucleotide and dinucleotide composition were the variables with highest correlation with the first two principal components, explaining 63.4% of the total codon usage variance. The results suggest that the main factors driving the evolution of codon usage in arboviruses is based on the nucleotide and dinucleotide composition present in the host. By comparing codon usage of emerging arboviruses to the codon usage in potential hosts, can help to identify hosts and vectors for emerging arboviruses.