BMC Bioinformatics 2015, null:174

A density-based clustering approach for identifying overlapping protein complexes with functional preferences

Lun Hu, Keith Chan.
Background: Identifying protein complexes is an essential task for understanding the mechanisms of proteins in cells. Many computational approaches have thus been developed to identify protein complexes in protein-protein interaction (PPI) networks. Regarding the information that can be adopted by computational approaches to identify protein complexes, in addition to the graph topology of PPI network, the consideration of functional information of proteins has been becoming popular recently. Relevant approaches perform their tasks by relying on the idea that proteins in the same protein complex may be associated with similar functional information. However, we note from our previous researches that for most protein complexes their proteins are only similar in specific subsets of categories of functional information instead of the entire set. Hence, if the preference of each functional category can also be taken into account when identifying protein complexes, the accuracy will be improved. Results: To implement the idea, we first introduce a preference vector for each of proteins to quantitatively indicate the preference of each functional category when deciding the protein complex this protein belongs to. Integrating functional preferences of proteins and the graph topology of PPI network, we formulate the problem of identifying protein complexes into a constrained optimization problem, and we propose the approach DCAFP to address it. For performance evaluation, we have conducted extensive experiments with several PPI networks from the species of Saccharomyces cerevisiae and Human and also compared DCAFP with state-of-the-art approaches in the identification of protein complexes. The experimental results show that considering the integration of functional preferences and dense structures improved the performance of identifying protein complexes, as DCAFP outperformed the other approaches for most of PPI networks based on the assessments of independent measures of f-measure, Accuracy and Maximum Matching Rate. Furthermore, the function enrichment experiments indicated that DCAFP identified more protein complexes with functional significance when compared with approaches, such as PCIA, that also utilize the functional information. Conclusions: According to the promising performance of DCAFP, the integration of functional preferences and dense structures has made it possible to identify protein complexes more accurately and significantly.