ReMEMBeR Ranking Metric Embedding-Based Multicontextual Behavior Profiling for Online Banking Fraud Detection

Abstract

Anomaly detection relies on individuals’ behavior profiling and works by detecting any deviation from the norm. When used for online banking fraud detection, however, it mainly suffers from three disadvantages. First, for an individual, the historical behavior data are often too limited to profile his/her behavior pattern. Second, due to the heterogeneous nature of transaction data, there lacks a uniform treatment of different kinds of attribute values, which becomes a potential barrier for model development and further usage. Third, the transaction data are highly skewed, and it becomes a challenge to utilize the label information effectively. The three disadvantages result in both poor generalization and high false positive rate of anomaly detection, and we propose a ranking metric embedding based multi-contextual behavior profiling (ReMEMBeR) model to battle them effectively. We solve the original fraud detection problem as a pseudo-recommender system problem, where an individual is treated as a pseudo-user, his/her behavior as a pseudo-item, and the label as the corresponding pseudo-rating. With the idea of collaborative filtering, for an individual, information from other similar individuals can be used to establish his/her behavior profile. In order to obtain a uniform treatment of heterogeneous attributes, we turn to an embedding based method to learn both attribute embedding and individuals’ behavior profiles within a common latent space simultaneously. To utilize the label information better, our model is designed to fit pseudo-users’ correct preference ranking for pseudo-items. By doing so, it explicitly learns to tell the fraudulent from the legitimate. Last but not least, we propose to identify and distinguish individuals under different contexts and further generalize the behavior profiling model to be a multi-contextual one. The proposed model can, thus, integrate the multi-contextual behavior patterns and allow transactions to be examined under the di...

Existing System

? Our approach aims to fill the gap between existing methods and provide researchers with a tool that generates reliable data to experiment with different fraud detection techniques and compare them with other approaches. ? BankSim is an agent-based simulator of bank payments based on a sample of aggregated transactional data provided by a bank in Spain. ? The data sets generated by BankSim contain no personal information or disclosure of legal and private customer transactions. Therefore, it can be shared by academia, and others, to develop and reason about fraud detection methods. ? The main goal of developing this simulation is that it enables us to share realistic fraud data, without exposing potentially business or personally sensitive information about the actual source.

Disadvantages

? It is possible to combine multivariate pattern analysis (MVPA) and hidden Markov models (HMM) to discover the major phases that students go through in solving complex problems. ? Thus, when we discuss successive student attempts, we mean successive attempts on the same skill, and ignore intervening problems on other skills. ? We should reflect on why so much effort is being devoted to the problem of predicting student next response. ? However, there remain several interesting, known problems in student modeling that can inform us about student learning, and have a clear correspondence to improving tutorial decision making.

Proposed System

• The main purpose of BankSim is the generation of synthetic data that can be used for fraud detection research. Statistical and a Social Network Analysis (SNA) of relations between merchants and customers were used to develop and calibrate the model. • They have in common that all are built with the aim of modelling financial activity with the purpose of generating synthetic data sets for fraud detection research. • Our main purpose is to generate a synthetic data set of commercial transactions that can be used for the development and testing of different fraud detection techniques. • Unfortunately for this addition there is a lack of real data that we can use for this purpose, but hopefully in the future we will find financial institutions interested in our project that are willing to share this data.

Advantages

? This research area certainly appeared to be ripe grounds for rapid improvement, with reported R 2 values for Performance Factors Analysis and Bayesian knowledge tracing of 0.07 and 0.17, respectively. ? In general, improvements in model accuracy have been minimal, particularly given the relatively low baseline performances. ? While this approach would certainly be very powerful, it does not give us much guidance about limiting factors on performance as the only conclusion one could draw would be a student modeling technique that could see the future with perfect accuracy would do a very good job. ? However, we first give our baseline assumptions, then describe our data, and finally provide baseline model performance when trained on those data.

Download DOC Download PPT