Skip to main content
ARS Home » Northeast Area » Ithaca, New York » Robert W. Holley Center for Agriculture & Health » Emerging Pests and Pathogens Research » Research » Publications at this Location » Publication #368076

Research Project: Management and Biology of Arthropod Pests and Arthropod-borne Plant Pathogens

Location: Emerging Pests and Pathogens Research

Title: Assessing protein sequence database suitability using de novosequencing

item JOHNSON, RICHARD - University Of Washington
item SEARLE, BRIAN - Institute For Systems Biology
item NUNN, BROOK - University Of Washington
item GILMORE, JASON - University Of Washington
item PHILLIPS, MOLLY - University Of Washington
item AMEMIYA, CHRIS - University Of California
item Heck, Michelle
item MACCOSS, MICHAEL - University Of Washington

Submitted to: Molecular and Cellular Proteomics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 10/5/2019
Publication Date: 1/5/2020
Citation: Johnson, R., Searle, B.C., Nunn, B.L., Gilmore, J.M., Phillips, M., Amemiya, C.T., Heck, M.L., Maccoss, M.J. 2020. Assessing protein sequence database suitability using de novosequencing. Molecular and Cellular Proteomics.

Interpretive Summary: Proteomics is the study of all the proteins produced by a cell, tissue, or organism. Proteomics analysis enables scientists to understand the response of an organism to its biotic and abiotic environment. Proteomics studies are fraught with difficulty because proteins have extraordinarily complex biochemical and physiochemical properties, making software development to interpret proteomics data difficult. Difficulties are also encountered when conducting a proteomics analysis of an organism without a sequenced genome to provide a suitable database for protein searching or an analysis of a complex sample made up of one or more different species, such as the citrus greening insect vector, the Asian citrus psyllid, which harbors beneficial bacterial partners and also the citrus greening bacterial pathogen. In this work, a new approach to the analysis of proteomics data is proposed and tested. The new method provides information on the quality of the proteomics data and the usefulness of the database used in the data analysis. The applications of this approach enable proteomics analysis of closely related species, complex samples made up of one or more species, and proteomics of extant organisms.

Technical Abstract: The analysis of samples from unsequenced and/or understudied species as well as samples where the proteome is derived from multiple organisms poses two key questions. The first is whether the proteomic data obtained from an unusual sample type even contains peptide tandem mass spectra. The second question is whether an appropriate protein sequence database is available for proteomic searches. We describe the use of automated de novo sequencing for evaluating both the quality of a collection of tandem mass spectra and the suitability of a given protein sequence database for searching that data. Applications of this method include the proteome analysis of closely related species, metaproteomics, and proteomics of extant organisms.