A Drift Region-Based Data Sample Filtering Method

Abstract

Concept drift refers to changes in the underlying data distribution of data streams over time. A well-trained model will be outdated if concept drift occurs. Once concept drift is detected, it is necessary to understand where the drift occurs to support the drift adaptation strategy and effectively update the outdated models. This process, called drift understanding, has rarely been studied in this area. To fill this gap, this article develops a drift region-based data sample filtering method to update the obsolete model and track the new data pattern accurately. The proposed method can effectively identify the drift region and utilize information on the drift region to filter the data sample for training models. The theoretical proof guarantees the identified drift region converges uniformly to the real drift region as the sample size increases. Experimental evaluations based on four synthetic datasets and two real-world datasets demonstrate our method improves the learning accuracy when dealing with data streams involving concept drift.

Existing System

? In data stream mining, the emergence of new patterns or a pattern ceasing to exist is called concept drift. ? Concept drift makes the learning process complicated because of the inconsistency between existing data and upcoming data. ? In this paper, we analyzed existing concept drift adaptation algorithms and recognized the necessity of tracking regional drifts. ? However, most distribution based drift detection methods assume that a drift happens at an exact time point, and the data arrived before that time point is considered not important. ? By analyzing the density increasing or decreasing in a local region, learning systems are able to highlight dangerous regions and take relevant actions.

Disadvantages

? This study explores some of the common sources of error present in collected raw GPS data and presents a detailed filtering process designed to correct for these issues. ? To illustrate the effectiveness of the proposed filtration process across the range of vehicle vocations, test data from both light- and medium/heavy-duty applications are examined. ? The approximately 20% addition/removal of data that is occurring during the negative/duplicate time and signal gaps filter suggest source data with poor continuity and significant data acquisition issues. ? Unaddressed, these errors significantly impact the reliability of source data and limit the effectiveness of traditional drive cycle analysis approaches and vehicle simulation software.

Proposed System

• In this paper, we propose a regional density inequality metric, called local drift degree (LDD), to measure the likelihood of regional drift in every suspicious region. • Through investigating the distribution of data nearestneighbors, we proposed a novel metric, called LDD, to detect regional concept drift. • To retrieve nondrifted information from suspended historical data, we propose a local drift degree (LDD) measurement that can continuously monitor regional density changes. • The purpose of LDD is to quantify regional density discrepancies between two different sample sets, thereby, identifying density increased, decreased and stable regions.

Advantages

? When used jointly with vehicle simulation software, the data are invaluable in analyzing vehicle fuel use and performance, aiding in the design of more advanced and efficient vehicle technologies. ? While the outlying speed filter removed speed points that existed outside the realm of velocity limits, this filter looks at the derivatives with respect to time of the speed data to see if the recorded data matches expected vehicle performance levels. ? It is important to select limits for this filter that closely match the performance expected of the vehicle. ? To generate the “new” speed data, the same interpolated cubic spline curve fit used in previous filtration steps is applied over the newly generated time domain.

Download DOC Download PPT