Location: Plant, Soil and Nutrition ResearchTitle: Building a tRNA thermometer to estimate microbial adaptation to temperature
|CIMEN, EMRE - Cornell University|
|JENSEN, SARAH - Cornell University|
|Buckler, Edward - Ed|
Submitted to: Nucleic Acids Research
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/20/2020
Publication Date: 11/16/2020
Citation: Cimen, E., Jensen, S., Buckler IV, E.S. 2020. Building a tRNA thermometer to estimate microbial adaptation to temperature. Nucleic Acids Research. 48(21):12004-12045. https://doi.org/10.1093/nar/gkaa1030.
Interpretive Summary: Determining optimal growth temperature (OGT) for microorganisms has important implications for developing industrial biochemical processes and understanding evolution and adaptation to temperature, but it is difficult to experimentally obtain OGTs for many species due to the difficulty culturing them in a laboratory setting. Models that can use easily-obtainable data, like genome sequence, are a practical way to determine OGT information for many species at once, or when experimentally verifying OGT is otherwise impossible. We built a machine learning model that predicts OGT accurately from genomic tRNA sequences. This makes it possible to predict OGT for the thousands of microbes with existing genome assemblies. This model is a substantial improvement over existing models to predict microbe OGT. Essentially, with only ~2400 base pairs of sequence (~0.1% of a bacterial genome), this model can predict OGT as well as published models that use summary data from the entire rest of the genome. Because this model has very limited input data requirements, it is easy to use and the predictions from this model can be used in downstream applications, including developing industrial biochemical processes that need to proceed under very specific temperature regimes, and for researchers seeking to understand molecular evolution and adaptation to temperature.
Technical Abstract: Because ambient temperature affects biochemical reactions, organisms living in extreme temperature conditions adapt protein composition and structure to maintain biochemical functions. While it is not feasible to experimentally determine optimal growth temperature (OGT) for every known microbial species, organisms adapted to different temperatures have measurable differences in DNA, RNA, and protein composition that allow OGT prediction from genome sequence alone. In this study, we built a model using tRNA sequence to predict OGT. We used tRNA sequences from 100 archaea and 683 bacteria species as input to train two Convolutional Neural Network models. The first pairs individual tRNA sequences from different species to predict which comes from a more thermophilic organism, with accuracy ranging from 0.538 to 0.992. The second uses the complete set of tRNAs in a species to predict optimal growth temperature, achieving a maximum r2 of 0.86; comparable with other prediction accuracies in the literature despite a significant reduction in the quantity of input data. This model improves on previous OGT prediction models by providing a model with minimum input data requirements, removing laborious feature extraction and data preprocessing steps, and widening the scope of valid downstream analyses.