IFC-BD An Interpretable Fuzzy Classifier for Boosting Explainable Artificial Intelligence in Big Data

Abstract

In current Data Science applications, the course of action has derived to adapt the system behavior for the human cognition, resulting in the emerging area of explainable artificial intelligence. Among different classification paradigms, those based on fuzzy rules are suitable solutions to stress the interpretability of the global systems. However, in case of addressing Big Data analytics, they may comprise an excessive number of rules and/or linguistic labels that not only may cause losing the system performance but also may affect the system semantic as well as the system interpretability. In this paper, we propose IFC-BD, an Interpretable Fuzzy Classifier for Big Data, aiming at boosting the horizons of explainability by learning a compact yet accurate fuzzy model. IFC-BD is developed in a cell-based distributed framework through the three working stages of initial rule learning, rule generalization, and heuristic rule selection. This whole procedure allows reaching from a high number of specific rules to less number of more general and confident rules. Additionally, in order to resolve possible rules conflict, a new estimated rule weight is proposed specifically for Big Data problems. IFC-BD was evaluated in comparison to the state-of-the-art approaches of the fuzzy classification paradigm, considering interpretability, accuracy, and running time. The findings of the experiments revealed that the proposed algorithm was able to improve the explainability of fuzzy rule-based classifiers as well as their predictive performance.

Existing System

? Organic data can be used to enhance existing statistical exercises, especially in improving coverage when this is incomplete. ? Intensified regulatory pressures has increased the number of false positives generated by existing software solutions. ? These statistics will provide meaningful complement to existing official data – for instance to assess the impact of our Assets Purchasing Programme (APP) on market functioning and for calculating a new overnight unsecured interest rate for the euro area (ESTER). ? Moreover, big data analytics can help to enhance existing financial sector assessment processes, by extending conventional methodologies and providing additional insights – in terms of eg financial sentiment analysis, early warning systems, stress-test exercises and network analysis.

Disadvantages

? A dynamic rule filtering scheme to focus on the high density areas of the problem, for the sake of simplifying the baseline fuzzy classification model. ? In this regard, some recent innovative frameworks motivate data scientists to develop new algorithms intentionally for Big Data problems. ? Artificial Intelligence methods, especially ML models, are increasingly applied to solve complex and computational problems of human life. ? Although these datasets are for binary classification problems, the proposed algorithm is a general solution and can be easily adapted to be applied in multi-class problems.

Proposed System

• Prompted by advances in computing power, machine learning (ML hereafter) methods have recently been proposed as alternatives to time-series regression models typically used by central banks for forecasting key macroeconomic variables. • ML methods have recently been proposed as alternatives to timeseries regression models typically used by central banks. • Gradient Boosting trees model is proposed by Friedman (1999) and has the advantage of reducing both variance and bias. • The corresponding ROC curve of extreme gradient boosting (XGBoost) is higher over all the considered competitors supporting the high degree of efficacy and generalization capacity of the proposed employed machine learning system.

Advantages

? On the other perspective to obtain a competitive predictive performance, some methods like CFM-BD, apply a pre-processing transformation that is not straightforwardly interpretable. ? They may pose several challenges to the system behavior, e.g. the computation overheads, increasing running times, loss of the interpretability, or even hindering the predictive performance, among others. ? Throughout this section, the capabilities of different models in terms of system interpretability and discrimination performance are analyzed. ? As depicted, IFC-BD stands out from the rest, obtaining by far the highest interpretability level with a very similar accuracy performance. ? On the other hand, as its authors stated, this stage has an important role in the efficiency and robustness of the model so that by omitting this stage, the accuracy values would probably decrease.

Download DOC Download PPT