A Unified Framework for User Identification across Online and Offline Data
ABSTARCT :
User identification across multiple datasets has a wide range of applications and there has been an increasing set of research works on this topic during recent years.However, most of existing works focus on user identification with a single input data type, e.g., (I) identifying a user across multiple social networks with online data and (II) detecting a single user from heterogeneous trajectory datasets with offline data. Different from previous works, in this paper, we propose a framework on user identification between online and offline datasets. We build connections between these two types of data by a mapping from IP addresses to physical locations.To solve this problem, we propose a novel framework consists of three steps. First, we use a clustering method based on locations of IP addresses to map IP addresses into specific physical location distributions. Second, we propose a novel pair wise index to reduce space cost and running time for computing the co-occurrence. Lastly, we apply a learning-to-rank method to merge the effect of multiple features we get in the first two steps. Based on our framework, we design experiments to demonstrate the efficiency (in time and space) of our framework, together with the precision and recall of our approach compared to other methods.
EXISTING SYSTEM :
? The identification of individuals across different social media sites.
? We divide these studies into three different categories according to the information they rely on: username-based identi-fication, profile-based identification and network- and content-based identification.
DISADVANTAGE :
? We design several experiments to evaluate our framework, and use a much larger dataset from Beijing to show that our framework can efficiently handle large-scale problems.
? Our main purpose is to identify corresponding users from two different datasets: the online and offline data.
PROPOSED SYSTEM :
? In this paper, we aim to perform user identification across online and offline datasets, instead of working only on one type of data as previous works.
? The intuition is that, although there exist several kinds of online and offline data, they often share similar features.
? For online data, since it is generated from online activities, it usually contains IP addresses from mobile devices.
? For offline data, there is often offline location information of users.
ADVANTAGE :
? we propose a novel pair wise-index to increase the efficiency of our frame works.
? In order to evaluate the efficiency and effectiveness of our frame work, we have designed several experiments.
? we have designed a series of experiments in order to test the efficiency of our framework.
|