Credit Scores Algorithms Ml 2
Credit Default Payment Prediction Algorithm with Machine Learning
Executive Summary
This project aims to build a state-of-the-art credit scoring and default prediction system for financial service providers with a stack of machine learning algorithms. Due to data source limits, the project is to conduct through the most recent machine learning techniques, starting from logistic regression, decision tree, random forest and gradient boosting machine. The XGBoost model gave the best model performance after training, with an accuracy of 93.89% and scored top 21% on the Kaggle leaderboard.
Table of Contents
I.Introduction
II.Machine Learning Modelling
- 2.1 Model Selection
- 2.2 Logistic Regression Model
- 2.3 Decision Tree
- 2.4 Random Forest
- 2.5 Gradient Boosting Machines with XGBoost
III.Hyperparameter Tuning
- 3.1 Introduction to Hyperparameters
- 3.2 Manually Tuning Hyperparameters
- 3.3 Automating Tuning Hyperparameters
- 3.4 Section Summary
IV. Submitting to Kaggle
V. Conclusion
- 5.1 The Purpose
- 5.2 The Aim
- 5.3 The Process
- 5.4 The Models
- 5.5 The Results
- 5.6 The limitations
- 5.7 Next Steps
VI. Acknowledgements
Reference
Ⅰ. Introduction
Since the financial crisis, organizations have realized the significance of risk management with the latest technology. To date, machine learning algorithms have been applied both in financial research and the financial service industry. Essentially, the implementation of credit scoring algorithms helps financial service providers estimate the creditworthiness of borrowers to reduce labour costs and constantly maintains sustainable development of the financial world.
Previously, machine learning researchers employed probit models such as linear regression, logistic regression, extreme gradient boosting(XGBoost), deep learning neural networks to estimate company rating with the input of financial behaviour (Provenzano et al., 2020). However, Addo et al. (2018) concluded that the tree-based models are more stable than ones based on multilayer artificial neural networks.
In this project, the dataset is from Kaggle (Link). The primary data is from individual credit card usage behaviour. We would focus on logistic regression, decision tree, random forest models and gradient boosting machine with data about individuals' credit behaviour in this project.
# Install packages with the command of package manager
!pip install jovian --upgrade --quiet
!pip install opendatasets scikit-learn plotly --upgrade --quiet
!pip install pandas numpy matplotlib seaborn --quiet
|████████████████████████████████| 68 kB 2.6 MB/s eta 0:00:011
Building wheel for uuid (setup.py) ... done
|████████████████████████████████| 24.8 MB 67.3 MB/s
|████████████████████████████████| 26.5 MB 1.5 MB/s