Credit Risk Evaluation using Credit Card data
A credit lending agency has approached as a client in regard to get help in evaluating the Credit risk of their customers and provided with the customer database. Let’s evaluate.
Table of contents
- 00. Project Overview
- 01. Data Overview
- 02. Modelling Overview
- 03. Logistic Regression
- 04. Random Forest
- 05. XGBoost Classifier
- 06. Modelling Summary
- 07. Predicting Missing Loyalty Scores
- 08. Growth & Next Steps
Project Overview
Context
The client, credit lending agency has approached in regard to get help in evaluating the Credit risk involved in the customer database. The dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients.
The aim of this work is to predict the accuracy if the customer would default or not the next payment.
To achieve this a predictive model is built which finds the relations between age, education, marriage, and delay in previous payments.
Actions
Firstly the necessary data from tables in the database needed to be compiled, gathering key customer metrics that may help predict if the customer may default.
For predicting the outcome, three various modelling approaches are considered. Namely:
- Logistic Regression
- Random Forest
- XGBoost
Results
The testing has found that XGBoost has the highest predictive accuracy.
Metric 1: Precision
- Logistic Regression =
- Random Forest =
- XGBoost =
Metric 2: Recall
- Logistic Regression =
- Random Forest =
- XGBoost =
Metric 3: f1 score
- Logistic Regression =
- Random Forest =
- XGBoost =
As the
Growth/Next Steps
While predictive accuracy was relatively high - other modelling approaches could be tested, especially those somewhat similar to Random Forest, and XGBoost, for example LightGBM to see if even more accuracy could be gained.
From a data point of view, further variables could be collected, and further feature engineering could be undertaken to ensure that we have as much useful information available for predicting if a customer may default or not.