DPCMNE detecting protein complexes from protein-protein interaction networks via multi-level network embedding

Abstract

Biological functions of a cell are typically carried out through protein complexes. The detection of protein complexes is therefore of great significance for understanding the cellular organizations and protein functions. In the past decades, many computational methods have been proposed to detect protein complexes. However, most of the existing methods just search the local topological information to mine dense subgraphs as protein complexes, ignoring the global topological information. To tackle this issue, we propose the DPCMNE method to detect protein complexes via multi-level network embedding. It can preserve both the local and global topological information of biological networks. First, DPCMNE employs a hierarchical compressing strategy to recursively compress the input protein-protein interaction (PPI) network into multi-level smaller PPI networks.Then, a network embedding method is applied on these smaller PPI networks to learn protein embeddings of different levels of granularity. The embeddings learned from all the compressed PPI networks are concatenated to represent the final protein embeddings of the original input PPI network. Finally, a core-attachment based strategy is adopted to detect protein complexes in the weighted PPI network constructed by the pairwise similarity of protein embeddings. To assess the efficiency of our proposed method, DPCMNE is compared with other eight clustering algorithms on two yeast datasets. The experimental results show that the performance of DPCMNE outperforms those state-of-the-art complex detection methods in terms of F1 and F1+Acc. Furthermore, the results of functional enrichment analysis indicate that protein complexes detected by DPCMNE are more biologically significant in terms of P-score.

Existing System

There exist techniques that can be utilized to remove false positives from the existing data without predicting novel interactions. Sometimes such approaches are based on logistic regression and require several PPI data sets originating from different experiments; they are able to detect parts of PPI networks of the highest quality by using overlaps of the data sets. Although these techniques can be used to propose high quality PPIs, the completeness of the data still remains an issue and can be resolved only by combining multiple experimental datasets, or by additional wet-lab experiments. Since there does not exist a gold standard PPI network for any organism, it is hard to judge which of the interactions from those reported by these methods to be of low-confidence are true interactions and which are false-positives.

Disadvantages

? Nodes in networks represent biomolecules such as genes or proteins, and edges between the nodes indicate interactions between the corresponding biomolecules. ? These interactions could be of many different types, including functional, genetic, and physical interactions. ? Understanding these complex networks is a fundamental issue in systems biology. Of particular importance are protein-protein interaction (PPI) networks. ? In PPI networks, nodes correspond to proteins and two nodes are linked by an edge if the corresponding proteins can interact.

Proposed System

• The technique presented in this paper is one of the first to use a network model of PPI networks for purposes other than just generating synthetic data. • We demonstrate that a geometric graph model can be used for assessing the confidence levels of known interactions in PPI networks and predicting novel ones. • We apply our technique to de-noise PPI data sets by detecting false positives and false negative interactions. • Although these techniques can be used to propose high quality PPIs, the completeness of the data still remains an issue and can be resolved only by combining multiple experimental datasets, or by additional wet-lab experiments.

Advantages

? To evaluate the performance of the proposed DPCMNE algorithm, the detected protein complexes are compared with the known protein complexes. ? On DIP dataset, DPCMNE achieves the highest F1, MCL achieves the highest Acc and DPCMNE obtains the second largest score of Acc, but the performance of MCL are very poor in terms of F1 and F1+Acc . ? On BioGRID dataset, we can also find that DPCMNE achieves the best performance in terms of F1 and F1+Acc . ? Overall, we conclude that our proposed DPCMNE has better comprehensive performance than other eight state-of-theart methods for detecting protein complexes.

Download DOC Download PPT