Submitted to: PLoS One
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 1/16/2013
Publication Date: 2/19/2013
Citation: Andorf, C.M., Honavar, V., Sen, T.Z. 2013. Predicting the binding patterns of hub proteins: a study using yeast protein interaction networks. PLoS One. 8(2):e56833. Interpretive Summary: Proteins are the principal catalytic agents, structural elements, signal transmitters, transporters and molecular machines in cells. Functional annotation of proteins remains one of the most challenging problems in functional genomics. Protein-protein interactions are critical to elucidating the role played by individual proteins in important biological pathways. Hub proteins often play essential roles in cellular control and tend to be highly conserved across species. Recent studies suggest the hubs are more diverse than previously thought and show striking differences in number of binding sites and kinetics of binding. We have demonstrated that it is possible to fairly reliably classify yeast hub proteins. These classifications provide insights into the structural and kinetic characteristics (respectively) of the corresponding proteins in the absence of interaction networks, expression data, three-dimensional structure, domains, or motifs. Our work provides information about protein-protein interaction networks that can be harnessed to understand aspects of cellular control, which in turn provides tools to create better plants by enhancing our mechanistic insight of genotype-phenotype relationships.
Technical Abstract: Protein-protein interactions are critical to elucidating the role played by individual proteins in important biological pathways. Such networks are typically constructed using high throughput techniques (e.g., Yeast-2-Hybrid experiments). Of particular interest are hub proteins that can interact with large numbers of partners. Hub proteins often play essential roles in cellular control and tend to be highly conserved across species. Recent studies suggest the hubs are more diverse than previously thought and show striking differences in number of binding sites and kinetics of binding. Depending on the number of binding sites, protein hubs can be classified at a structural level as singlish interface hubs (SIH) with one or two binding sites, or multiple interface hubs (MIH) with three or more binding sites. In terms of kinetics, hub proteins can be classified as date hubs (i.e., those that interact with different partners at different times or locations) or party hubs (i.e., those that simultaneously interact with multiple partners). The knowledge of a protein hub’s binding pattern and kinetics can therefore shed light on its functional role in the cell and provide insights into protein evolution, tertiary and quaternary constraints on interactions, organization of hot spots, and protein disorder. Against this background, we explore the use of machine learning techniques for classifying a hub protein as SIH versus MIH, or as a date hub versus a party hub from sequence information alone. We applied several sequence-based machine learning approaches to construct classifiers for labeling a protein as SIH versus MIH, and as date hubs versus party hubs. We compared the performance of simple Naive Bayes classifier with that of kth order Markov model, a hybrid classifier that combines the kth order Markov model with a sequence homology based method, and a simple method that makes use of protein domains. We report the performances of the resulting classifiers (assessed using cross-validation experiments) on two sets of protein hubs derived from yeast structure interaction networks. The results of our study show that it is possible to distinguish date hubs from party hubs with an accuracy of 69.1% and a correlation coefficient of 0.37, and SIH from MIH with an accuracy of 89.2% and a correlation coefficient of 0.68 (respectively). We conclude that it is possible to annotate proteins fairly reliably as SIH versus MIH and date versus party hubs using information derived entirely from the sequence of the corresponding protein. The method can be used even in settings where reliable protein-protein interaction data, or structures of protein-protein complexes, are unavailable to obtain useful insights into the functional and evolutionary characteristics of proteins and their interactions.