Title: Tangible benefits of the pea aphid genome sequencing in proteomics research: enhancements in protein identification, data incorporation, and evaluation criteria Authors
Submitted to: Journal of Insect Physiology
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: November 1, 2010
Publication Date: November 17, 2010
Citation: Cilia, M., Thannhauser, T.W., Gray, S.M. 2010. Tangible benefits of the pea aphid genome sequencing in proteomics research: enhancements in protein identification, data incorporation, and evaluation criteria. Journal of Insect Physiology. Available: http://www.ncbi.nlm.nih.gov/pubmed/21070785. Interpretive Summary: Having a fully sequenced genome can help interpret studies of other closely related species, but how, and by how much? Sequenced genomes promise to deliver the molecular details governing important cellular process fundamental to life, although the gene sequence alone cannot tell us how things actually work in the cell. To know how the cell functions, we must know the proteins involved, their various forms, and how, when, and where they are expressed, in real time, via experimental approaches collectively named proteomics. A fully sequenced and annotated genome remains the gold standard by which to search proteomics data because every gene, and therefore protein sequence, is represented. However, the complete genome sequence is only known for a relatively small number of model organisms that are often not the focus of the most important biological research. Therefore, many researchers are forced to rely on using the available sequence of a related model species to interpret data from their organism of study. Here we report on the usefulness of the genome of an insect, the pea aphid, to interpret the proteomics data derived from a related aphid, the greenbug. Prior to the release of the pea aphid genome, the only other insect genomes available were the fruit fly and the mosquito, both of which are very evolutionarily distant from aphids. There are almost no genomic data available for the greenbug, despite its importance as an organism used to study plant-aphid interactions, virus transmission, insecticide resistance and bacterial endosymbionts. We showed that having the pea aphid genome tripled the number of greenbug proteins that could be identified and also improved the confidence that the identifications were correct. The full-length pea aphid gene models also provided a way to calculate how closely related the proteins found in greenbug are to other aphids. While the pea aphid genome was extremely useful in the analysis of greenbug data, approximately 30% of the proteins still remain unidentified highlighting the continued need for sequencing of whole genomes from multiple species of any type of organism.
Technical Abstract: The pea aphid, Acyrthosiphon pisum, is an important agricultural pest and a model system for numerous aspects of aphid biology, including sexual and asexual reproduction, bacterial endosymbiosis, insecticide resistance, and the evolution of aphid and plant host interactions. Recently, its complete genome was sequenced and a massive effort to annotate the genome is now underway by aphid biologists around the globe. However, the genome itself cannot speak to which proteins are expressed and responsible for these enticing aspects of aphid biology. For this, the aphid community will ultimately rely on proteomics approaches. Here, we report, for the first time, direct benefits of the pea aphid sequencing and annotation effort in the interpretation of a large gel-based proteomics data set derived from a related aphid species, the greenbug, Schizaphis graminum. The greenbug is also an agricultural pest, and a model for aphid genetics, plant-aphid interaction studies, insecticide resistance and virus transmission. However, there is almost no genome information on S. graminum. Following the public release of the pea aphid genome sequence, we were able to triple the number of aphid protein identifications from mass spectrometry data. This was concomitant with a dramatic increase of the number of MS/MS peptide spectra matching the genome-derived protein sequence which greatly enhanced confidence in the protein identification. Furthermore, the pea aphid gene models provided one of the gold standards by which to judge the quality of our identifications, percent coverage of our tryptic peptides to the protein identified. These benefits highlight the importance of the pea aphid sequencing and annotation efforts to the larger aphid and agricultural research communities.