Toward Mining Capricious Data Streams A Generative Approach
ABSTARCT :
Learning with streaming data has received extensiveattention during the past few years. Existing approaches assumethat the feature space is fixed or changes by following explicitregularities, limiting their applicability in real-time applications.For example, in a smart healthcare platform, the feature space ofthe patient data varies when different medical service providersuse nonidentical feature sets to describe the patients’ symptoms.To fill the gap, we in this article propose a novel learning para-digm, namely, Generative Learning With Streaming Capricious(GLSC) data, which does not make any assumption on the featurespace dynamics. In other words, GLSC handles the data streamswith a varying feature space, where each arriving data instancecan arbitrarily carry new features and/or stop carrying partialold features. Specifically, GLSC trains a learner on a universalfeature space that establishes relationships between old and newfeatures, so that the patterns learned in the old feature space canbe used in the new feature space. The universal feature space isconstructed by leveraging the relatednesses among features. Wepropose a generative graphical model to model the constructionprocess, and show that learning from the universal featurespace can effectively improve the performance with theoreticalguarantees. The experimental results demonstrate that GLSCachieves conspicuous performance on both synthetic and realdata sets.
EXISTING SYSTEM :
? The dissemination of data stream phenomenon has necessitated the development of stream mining algorithms. The area has attracted the attention of data mining community.
? Online mining when data streams evolve over time, that is when concepts drift or change completely, is becoming one of the core issues of advanced analysis of data streams.
? When tackling non-stationary concepts, ensem-bles of classifiers have several advantages over single classifier methods: they are easy to scale and parallelize, they can adapt to change quickly bypruning under-performing parts of the ensemble, and they therefore usu-ally also generate more accurate concept descriptions.
DISADVANTAGE :
• Those approaches tackling the stream-ing data problem under different settings, however, rely onthe assumptions that the feature space is fixed or changes byfollowing explicit regularities
• we divide into two convex optimization subproblems,which are with respect towtandG, respectively, and thetwo subproblems are simultaneously solved at each iteration.
PROPOSED SYSTEM :
? They proposed a general framework to generate data simulating changing environments. Their framework accommodates the STAGGER and Moving Hyper plane generation strategies. They consider a set of k data sources with known distributions.
? The proposed techniques have their roots in statistics and theoretical computer science. Data-based and task-based techniques are the two categories of data stream mining algorithms. Based on these two categories, a number of clustering, classification, frequency counting and time series analysis have been developed.
? Systems have been implemented to use these techniques in real applications.
ADVANTAGE :
? This is the first work to learn with capricious data streams where data come with an arbitrarily varying feature space. We want to emphasize that our learning task does not make any assumption on the feature space dynamics, which is different from existing studies.
? We introduce a generative graphical model, which takes the observable feature space as the input and outputs a universal feature space. We analyze the performance bound of GLSC and prove that the obtained universal feature space can effectively improve the learning per-formance.
? Extensive experiments on both synthetic and real-world data sets demonstrate the superiority of GLSC.
|