Adaptation Strategies for Automated Machine Learning on Evolving Data

Abstract

Automated Machine Learning (AutoML) systems have been shown to efficiently build good models for new datasets. However, it is often not clear how well they can adapt when the data evolves over time. The main goal of this study is to understand the effect of concept drift on the performance of AutoML methods, and which adaptation strategies can be employed to make them more robust to changes in the underlying data. To that end, we propose 6 concept drift adaptation strategies and evaluate their effectiveness on a variety of AutoML approaches for building machine learning pipelines, including Bayesian optimization, genetic programming, and random search with automated stacking. These are evaluated empirically on real-world and synthetic data streams with different types of concept drift. Based on this analysis, we propose ways to develop more sophisticated and robust AutoML techniques.

Existing System

? We provide an overview of existing work in this field of research and categorize them according to three dimensions: search space, search strategy, and performance estimation strategy. ? We discuss several current and future directions for research on NAS. ? Most existing work has focused on NAS for image classification. ? On the one hand, this provides a challenging benchmark since a lot of manual engineering has been devoted to finding architectures that perform well in this domain and are not easily outperformed by NAS

Disadvantages

? The trade-off in this process is between exploitation of currently promising configurations versus exploration of new regions. ? In Sequential Model-Based Optimization (SMBO) configurations are evaluated one by one, each time updating the surrogate model and using the updated model to find new configurations. ? Popular choices for the surrogate model are Gaussian Processes, shown to give better results on problems with fewer dimensions and numerical hyperparameters, whereas Random Forestbased approaches are more successful in high-dimensional hyperparameter spaces of a discrete nature.

Proposed System

? Tuned Data Mining proposed to tune the hyperparameters of a full machine learning pipeline using Bayesian optimization; specifically, this used a single fixed pipeline and tuned the hyperparameters of the classifier as well as the per-class classification threshold and class weights. ? It is often desirable that equally good but faster algorithms are ranked higher, and multiple methods have been proposed to trade off accuracy and training time. ? The new set of proposed kernels is then evaluated in the next round. It is possible with the above rules that a kernel expression gets proposed several times, but a well implemented system will keep records and only ever evaluate each expression once.

Advantages

? This strategy assumes that the initial pipeline configuration will remain useful and only the models need to be updated in case their performance dwindles because of concept drift. ? It tests whether keeping a learner memory of past data is beneficial. This can give a performance boost if the learner can also adapt to the new data. ? This is a generalization of global model replacement to other AutoML systems and an extension of the hyperparameter optimization in to full pipelines. ? This strategy also assumes that the pipelines need to be re-tuned after drift. ? Rerunning the AutoML from scratch is more expensive, but could result in significant performance improvements in case of significant drift.

Download DOC Download PPT