Page Banner

United States Department of Agriculture

Agricultural Research Service

Title: Defining Parameters for Homology Tolerant Database Searching

item Kayser, Jean Patrick
item Vallet, Jeffrey

Submitted to: Journal of Biomolecular Techniques
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: October 20, 2004
Publication Date: December 1, 2004
Citation: Kayser, J.R., Vallet, J.L., Cerny, R.L. 2004. Defining parameters for homology tolerant database searching. Journal of Biomolecular Techniques. 15(4):285-295.

Interpretive Summary: Proteomics is the study of proteins which perform a variety of functions in living organisms. The function of a protein is determined by the amino acid composition, ie structure, of a protein. Identifying a protein provides clues to its function. Proteins can be identified following fragmentation by comparing the sizes of the resulting fragments with expected fragment sizes from proteins previously identified. In livestock species, the amino acid compositions of most proteins are not fully known, so the protein fragment sizes cannot be accurately compared. However, if the amino acid composition of a livestock protein is similar to a protein in another organism, then a comparison will provide clues to the identity and function. Our objective was to define a strategy for comparing amino acid composition of protein fragments from livestock proteins to protein information available for other species. Procedures were developed to compare information from unknown proteins to the amino acid compositions from other species, resulting in improved protein identification for livestock proteins. Because complete amino acid compositions of proteins are not available for most organisms, these methods should have wide applicability to the study of proteins from livestock and other organisms for which amino acid compositions are not available.

Technical Abstract: De novo interpretation of tandem mass spectrometry spectra provides another method to search the database for species with limited sequence information. Our objective was to define a strategy for this type of homology tolerant database search. Homology searches (ie, MS-Homology) were conducted with 20, 10 or 5 of the most abundant peptides from 9 proteins, based either on precursor trigger intensity or total ion current, and allowing for 50, 30 or 10% mismatch in the search. Protein scores were corrected by subtracting a threshold score that was calculated from random peptides. The highest (p<.01) corrected protein scores (ie, above the threshold) were obtained by submitting 20 peptides and allowing 30% mismatch. Using these criteria, protein identification based on peptide mass (ie, MASCOT) was compared to searching homology. The highest-ranking protein was the same for MASCOT, homology search, using the 20 most intense peptides, or using all peptides, for 63.4% of 112 spots from 2D-PAGE gels. For these proteins, the percent coverage was greatest using MASCOT compared to using all or just the 20 most intense peptides in a homology search (25.1,18.3 and 10.6%, respectively). Finally, 35% of de novo sequences completely matched the corresponding known amino acid sequence of the matching peptide. This percentage increased when the search was limited to the 20 most intense peptides (44.0%). After identifying the protein using MS-Homology, a peptide mass search may increase the percent coverage of the protein identified.

Last Modified: 4/17/2014