Credit Risk Evaluation using Credit Card data

A credit lending agency has approached as a client in regard to get help in evaluating the Credit risk of their customers and provided with the customer database. Let’s evaluate.

Table of contents


Project Overview

Context

The client, credit lending agency has approached in regard to get help in evaluating the Credit risk involved in the customer database. The dataset contains information on default payments, demographic factors, credit data, history of payment, and bill statements of credit card clients.

The aim of this work is to predict the accuracy if the customer would default or not the next payment.

To achieve this a predictive model is built which finds the relations between age, education, marriage, and delay in previous payments.

Actions

Firstly the necessary data from tables in the database needed to be compiled, gathering key customer metrics that may help predict if the customer may default.

For predicting the outcome, three various modelling approaches are considered. Namely:

  • Logistic Regression
  • Random Forest
  • XGBoost

Results

The testing has found that XGBoost has the highest predictive accuracy.


Metric 1: Precision

  • Logistic Regression =
  • Random Forest =
  • XGBoost =


Metric 2: Recall

  • Logistic Regression =
  • Random Forest =
  • XGBoost =


Metric 3: f1 score

  • Logistic Regression =
  • Random Forest =
  • XGBoost =

As the

Growth/Next Steps

While predictive accuracy was relatively high - other modelling approaches could be tested, especially those somewhat similar to Random Forest, and XGBoost, for example LightGBM to see if even more accuracy could be gained.

From a data point of view, further variables could be collected, and further feature engineering could be undertaken to ensure that we have as much useful information available for predicting if a customer may default or not.