Location: Livestock Bio-SystemsTitle: MOCASSIN-prot software
|DENG, BO - University Of Nebraska|
|MORIYAMA, ETSUKO - University Of Nebraska|
Submitted to: Software and User Manual Public Release
Publication Type: Other
Publication Acceptance Date: 11/16/2017
Publication Date: 4/15/2018
Citation: Keel, B.N., Deng, B., Moriyama, E.N. 2018. MOCASSIN-prot software [computer program]. Version 1. Lincoln, Nebraska: University of Nebraska-Lincoln. Available: http://bioinfolab.unl.edu/emlab/MOCASSINprot.
Interpretive Summary: A protein domain is a highly conserved part of a protein sequence. It is a structural unit, and often associated to a discrete function. Proteins often include multiple domains. Various evolutionary events including duplication and loss of domains, domain shuffling, as well as sequence divergence contribute to generating complexities in protein structures. A large variation exists, for example, in the numbers, combinations, and orders of domains among protein families and subfamilies, and consequently, in their functions. Thus, protein functions cannot be understood fully without integrating their constitutive domain information. Existing methods for classifying proteins construct domain networks and protein networks individually, with practically no connection between them. The MOCASSIN-prot software implements a method to build protein networks in terms of domain architectures and to improve and enhance protein function prediction.
Technical Abstract: MOCASSIN-prot is a software, implemented in Perl and Matlab, for constructing protein similarity networks to classify proteins. Both domain composition and quantitative sequence similarity information are utilized in constructing the directed protein similarity networks. For each reference protein in an input set, the MOCASSIN-prot pipeline identifies the domain architecture of the protein by HMMER3 profile hidden Markov model search and constructs a similarity matrix, where the amino acid sequences of all domain regions on the protein are compared to all other protein sequences using a log-transformed BLAST E-value as the similarity score. This matrix serves as the input to a multi-objective optimization problem, which is solved using the theory of linear programming (LP). The solutions from all protein LPs are then used to construct a graph adjacency matrix for the protein network. The computation time for MOCASSIN-prot can be significantly decreased by utilizing the Matlab parallel computing option.