Location: Livestock Bio-SystemsTitle: MOCASSIN-prot: A multi-objective clustering approach for protein similarity networks
|DENG, BO - University Of Nebraska|
|MORIYAMA, ETSUKO - University Of Nebraska|
Submitted to: Bioinformatics
Publication Type: Peer Reviewed Journal
Publication Acceptance Date: 11/16/2017
Publication Date: 4/15/2018
Publication URL: http://handle.nal.usda.gov/10113/6249257
Citation: Keel, B.N., Deng, B., Moriyama, E.N. 2018. MOCASSIN-prot: A multi-objective clustering approach for protein similarity networks. Bioinformatics. 34(8):1270-1277. https://doi.org/10.1093/bioinformatics/btx755.
Interpretive Summary: A protein domain is a highly conserved part of a protein sequence. It is a structural unit and often associated to a discrete function. Proteins often include multiple domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures. A large variation exists, for example, in the numbers, combinations, and orders of domains among protein families and subfamilies and, consequently, in their functions. Thus, protein functions cannot be understood fully without integrating their constitutive domain information. Existing methods for classifying proteins construct domain networks and protein networks individually with practically no connection between them. In this work, we present a method to build protein networks in terms of domain architectures and to improve and enhance protein function prediction. Protein sequence evolution is primarily governed by selective constraints on their sequences to maintain functions and also by modularity of domains that allows functional innovation. With this assumption, we present MOCASSIN-prot, a multi-objective optimization method, which not only provides protein classification using network clustering, but also gives us a better interpretation of the relationships between proteins and domains.
Technical Abstract: Motivation: Proteins often include multiple conserved domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures, and consequently, in their functions. The evolutionary history of proteins is hence best modeled through networks that incorporate information both from the sequence divergence and the domain content. Here, a game-theoretic approach proposed for protein network construction is adapted into the framework of multi- bjective optimization, and extended to incorporate clustering refinement procedure. Results: The new method, MOCASSIN-prot, was applied to cluster multi-domain proteins from ten genomes. The performance of MOCASSIN-prot was compared against two protein clustering methods, Markov clustering (TRIBE-MCL) and spectral clustering (SCPS). We showed that compared to these two methods, MOCASSIN-prot, which uses both domain composition and quantitative sequence similarity information, generates fewer false positives. It achieves more functionally coherent protein clusters and better differentiates protein families. Availability and implementation: MOCASSIN-prot, implemented in Perl and Matlab, is freely available at http://bioinfolab.unl.edu/emlab/MOCASSINprot.