Financial Distress Prediction
Financial Distress Prediction
Banks play a crucial role in market economies. They decide who can get finance and on what terms and what can make or break investment decisions. For markets and society to function, individuals and companies need access to credit.
Lenders approve loans based on several factors, including your credit score.
what are credit scores?
Traditional credit scores try to predict how likely you are to repay a loan, and they use historical data about your borrowing behavior to do so. To generate a credit score, a computer program reads through data in your credit reports looking for information like:
- Whether you have borrowed money in the past, and how long you’ve been borrowing
- Whether you repaid your loans as agreed
- Whether you’ve missed payments on your loans in the past
- How you’re currently using debt, including how much you’re borrowing, and what types of debt you use
- Whether any public records about you exist, like bankruptcy or legal judgments against you from a creditor
- Whether you’ve recently applied for loans
Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted.
The aim of this project is to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. This model will help borrowers to make the best financial decisions. To accomplish this task, we will be using a dataset that contains about 150,000 rows with 12 columns. The target column of our dataset has two variables, 0 and 1;
- 0 represents - No, the borrower will not experience financial distress in the next two years, and
- 1 represents - Yes, the borrower will experience financial distress in the next two years
Source: The dataset used for this project was obtained from Kaggle: Link
Here's the outline we will follow to complete this project
- Downloading the Data
- Perform exploratory analysis and visualization on the dataset
- Preprocess and clean the data using the pandas library
- Set up evaluation metrics
- Build models
- Tune hyperparameters of our best performing model
- Make predictions on new input data
Before we begin let's do a variale name description to better understand the dataset we will be working with.
SeriousDlqin2yrs: Person experienced 90 days past due delinquency or worse Y/N
RevolvingUtilizationOfUnsecuredLines: Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits percentage
age: Age of borrower in years integer
NumberOfTime3059DaysPastDueNotWorse: Number of times borrower has been 30-59 days past due but no worse in the last 2 years. integer
DebtRatio: Monthly debt payments, alimony,living costs divided by monthy gross income percentage
MonthlyIncome: Monthly income real
NumberOfOpenCreditLinesAndLoans: Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) integer
NumberOfTimes90DaysLate: Number of times borrower has been 90 days or more past due. integer
NumberRealEstateLoansOrLines: Number of mortgage and real estate loans including home equity lines of credit integer
NumberOfTime60-89DaysPastDueNotWorse: Number of times borrower has been 60-89 days past due but no worse in the last 2 years. integer
NumberOfDependents: Number of dependents in family excluding themselves (spouse, children etc.) integer
To begin, let's install and import useful libraries.
!pip install missingno plotly opendatasets scikit-learn xgboost lightgbm --upgrade --quiet
|████████████████████████████████| 15.3 MB 5.1 MB/s |████████████████████████████████| 255.9 MB 35 kB/s |████████████████████████████████| 2.0 MB 39.6 MB/s
# Importing libraries import opendatasets as od import os import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt import plotly.express as px import missingno as msno import matplotlib %matplotlib inline sns.set_style('darkgrid') matplotlib.rcParams['font.size'] = 14 matplotlib.rcParams['figure.figsize'] = (10, 6) matplotlib.rcParams['figure.facecolor'] = '#00000000' import warnings warnings.filterwarnings('ignore') from sklearn.model_selection import train_test_split from sklearn.preprocessing import MinMaxScaler from sklearn.linear_model import LogisticRegression from sklearn.metrics import f1_score from sklearn.linear_model import LogisticRegression from sklearn.tree import DecisionTreeClassifier from sklearn.ensemble import RandomForestClassifier from xgboost import XGBClassifier from lightgbm import LGBMClassifier