Submitted to: Evolutionary Bioinformatics
Publication Type: Peer reviewed journal
Publication Acceptance Date: 12/20/2011
Publication Date: 1/5/2012
Citation: Hu, J., Yan, X. 2012. BS-KNN: an effective algorithm for predicting protein subchloroplast localization. Evolutionary Bioinformatics. 8:79-87. Interpretive Summary: Although the role of chloroplasts as the photosynthetic apparatus in cells of green plants and eukaryotic algae has been well defined, little is known about the location of the proteins residing in the intricate network of membranes within the chloroplast. Chloroplasts are enveloped by four membrane layers providing an ideal computational, experimental, and theoretical point of view to study protein sub-cellular localization. The identification of sub-cellular location of these proteins could provide an in-depth understanding of protein-protein interactions and protein function prediction. In this study, we present a computer-based method to predict the location of proteins and to assign them to defined regions within the chloroplast. This case study could provide an excellent way of developing and applying new algorithm/software for functional bacterial protein prediction as well.
Technical Abstract: Chloroplasts are organelles found in cells of green plants and eukaryotic algae that conduct photosynthesis. Knowing a protein’s subchloroplast location provides in-depths insights about the protein’s function and the microenvironment where it interacts with other molecules. Despite the chloroplast proteome projects and several computational methods for identifying chloroplast proteins, there are only a very limited number of methods for predicting proteins’ subchloroplast locations. In this paper, we present a bit-score weighted K-nearest neighbor method for predicting protein subchloroplast locations. The method makes prediction based on the bit-score weighted Euclidean distance calculated from the composition of selected pseudo-amino acids. Our method achieves 76.4% overall accuracy in assigning proteins to 4 subchloroplast locations in cross-validation. When tested on an independent set that was not seen by the method during the training and feature selection, the method achieves a consistent overall accuracy of 76.0%. Comparisons showed that it outperformed previously published methods. The method was also applied to predict subchloroplast locations of proteins in the chloroplast proteome. The software and datasets of the proposed method is available at https://edisk.fandm.edu/jing.hu/bsknn/bsknn.html.