Predictive Analytics for Default of Credit Card Clients
ABSTARCT : Predictive analytics has a significant potential to support different decision processes. We aimed to compare various machine learning algorithms for the selected task, which predicts credit card clients' default based on the free available data. We chose Random Forest, AdaBoost, XGBoost, and Gradient Boosting algorithm and applied them to a prepared data sample. We experimentally evaluated the classification models within metrics like accuracy, precision, recall, ROC, and AUC. The results show a very similar performance of the selected algorithms on this dataset. The Gradient boosting (0.7828) achieved the best performance within AUC, but the best precision for target class 1 reached the Bagging algorithm (0.72). The simple data processing brought only minimal improvements in individual metrics. Our results are comparable to the mentioned studies instead of MCC metrics that resulted in better value (0.4111) achieved by the Gradient Boosting model.
? Credit card default is the failure to pay a calculated minimum repayment amount, which comprises interests and some principal amount, on an existing balance, when due.
? We plan to develop a logistic regression model and machine learning (ML) based algorithms, such as decision tree, random forest, and artificial neural network, for the credit default problem and study the performance of these models.
? There are also newer machine learning based approaches, such as decision tree, random forest, etc.
? A hyper-parameter is dierent from model parameters that are obtained by tting the model with the training data.
? The default of credit card clients data set is used for the estimation and the problem is solved as a binary classification problem.
? The problem of credit risk evaluation is seen as a classification problem, with two different classes’ bad creditors and good creditors.
? Support Vector Machine: Support vector machine is a supervised learning method which can be used for classification as well as regression problems.
? Therefore, whether or not the estimated probability of default produced from data mining methods can represent the ‘‘real” probability of default is an important problem.
• The proposed hybrid approach converges much faster than the conventional neural networks model.
• Moreover, the credit scoring accuracy increases in terms of the proposed methodology and the hybrid approach outperforms traditional dscriminant analysis and logistic regression.
• To estimate the real probability of default, the novel approach, called Sorting Smoothing Method (SSM), was proposed in this study.
• The major purpose of risk prediction is to use financial information, such as business financial statement, customer transaction and repayment records, etc., to predict business performance or individual customers’ credit risk and to reduce the damage and uncertainty.
? WEKA tool is used which provides us with the Mean Absolute error(MAE) and Root Mean Squared Error (RMSE) values which helps to identify the best performance of MLP as well as kNN algorithms for the respective values of k and the respective neurons in the hidden layer.
? Precision and recall are the two important performance measures used for evaluation.
? The feature selection algorithms used in this case contribute significantly in improving the performance of the classifiers.
? Apart from the kernel functions, performance of the classifiers is also evaluated using confusion matrix.
? The performance can also be optimized by using some Genetic Algorithms which can help in the selection of parameters.
|