Effective Similarity Search on Heterogeneous Networks A Metapath Free Approach
Abstract : Heterogeneous information networks (HINs) are usually used to model information systems with multitype objects and relations. In contrast, graphs that have a single type of nodes and edges, are often called homogeneous graphs. Measuring similarities among objects is an important task in data mining applications, such as web search, link prediction, and clustering.Currently, several similarity measures are defined for HINs. Most of these measures are based on metapaths, which show sequences of node classes and edge types along the paths between two nodes.. However, metapaths, which are often designed by domain experts, are hard toenumerate and choose w.r.t. the quality of similarity scores. This makes using existing similarity measures in real applications difficult.To address this problem, we extend SimRank, a wellknown similarity measure on homogeneous graphs, to HINs, by introducing the concept of the decay graph. The newly proposed similarity measure is called HowSim, which has the property of being metapath free, and capturing the structural and semantic similarity simultaneously.The generality and effectiveness of HowSim, and the efficiency of our proposed algorithms for computing HowSim scores, are demonstrated by extensive experiments.
? Metapaths can provide guidance for search and mining of the network and help analyze and understand the semantic meaning of the objects and relations in the network.
? Under this framework, similarity search and other mining tasks such as relationship prediction and clustering can be addressed by systematic exploration of the network meta structure.
? In order to apply homogeneous information networkbased methods into heterogeneous information networks, we have to either project the heterogeneous networks into homogeneous ones , or simply ignore the type information associated with nodes and link
? After the loss function is obtained, we then solve the minimization problem using the above gradient descent process. In practice, we cannot enumerate all possible symmetric paths of a and b, since their number is infinite.
? To address this problem, some recent methods such as Meta path 2Vec [30] and HIN2Vec [44] were proposed to learn graphembeddings on heterogeneous graphs.
? The former one is usually used for allpairs SimRank, while the latter is for singlepair/source queries. We study allpairs HowSim inthis paper, and leave the HowSim query problem as the future work.
? We introduce an iterative method to compute How Sim scores with an accuracy guarantee, and then propose optimization techniques to boost computation .
? We propose an an effective algorithm for finding a decay graph automatically for How Sim. We perform extensive experiments to validate the effectiveness and generality of HowSim.
? We propose a general and metapath free similarity measure, i.e. HowSim, by proposing the concept of the decay graph to extend Sim Rank to HINs. We study properties of HowSim in depth and show it is symmetric, nonnegative, normalized and selfmaximum. This leads to HowSim being a semimetric measure
? we propose a naive iterative method and prove that it always converges to the correct HowSim scores .
? We also propose heuristic optimization strategies to improve the efficiency.
? Deeplearning based methods have shown promising performance in the similarity evaluation.
? Some works such as D2AGE and IPE aimed to produce embeddings particularly for the similarity search task by a pathtopath learning schema.
