Financial Distress Prediction
Financial Distress Prediction
Introduction
Banks play a crucial role in market economies. They decide who can get finance and on what terms and what can make or break investment decisions. For markets and society to function, individuals and companies need access to credit.
Lenders approve loans based on several factors, including your credit score.
what are credit scores?
Traditional credit scores try to predict how likely you are to repay a loan, and they use historical data about your borrowing behavior to do so. To generate a credit score, a computer program reads through data in your credit reports looking for information like:
- Whether you have borrowed money in the past, and how long you’ve been borrowing
- Whether you repaid your loans as agreed
- Whether you’ve missed payments on your loans in the past
- How you’re currently using debt, including how much you’re borrowing, and what types of debt you use
- Whether any public records about you exist, like bankruptcy or legal judgments against you from a creditor
- Whether you’ve recently applied for loans
Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted.
Project Objective
The aim of this project is to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. This model will help borrowers to make the best financial decisions. To accomplish this task, we will be using a dataset that contains about 150,000 rows with 12 columns. The target column of our dataset has two variables, 0 and 1;
- 0 represents - No, the borrower will not experience financial distress in the next two years, and
- 1 represents - Yes, the borrower will experience financial distress in the next two years
Source: The dataset used for this project was obtained from Kaggle: Link
Project Outline
Here's the outline we will follow to complete this project
- Downloading the Data
- Perform exploratory analysis and visualization on the dataset
- Preprocess and clean the data using the pandas library
- Set up evaluation metrics
- Build models
- Tune hyperparameters of our best performing model
- Make predictions on new input data
Variable Description
Before we begin let's do a variale name description to better understand the dataset we will be working with.
SeriousDlqin2yrs
: Person experienced 90 days past due delinquency or worse Y/NRevolvingUtilizationOfUnsecuredLines
: Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits percentageage
: Age of borrower in years integerNumberOfTime3059DaysPastDueNotWorse
: Number of times borrower has been 30-59 days past due but no worse in the last 2 years. integerDebtRatio
: Monthly debt payments, alimony,living costs divided by monthy gross income percentageMonthlyIncome
: Monthly income realNumberOfOpenCreditLinesAndLoans
: Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) integerNumberOfTimes90DaysLate
: Number of times borrower has been 90 days or more past due. integerNumberRealEstateLoansOrLines
: Number of mortgage and real estate loans including home equity lines of credit integerNumberOfTime60-89DaysPastDueNotWorse
: Number of times borrower has been 60-89 days past due but no worse in the last 2 years. integerNumberOfDependents
: Number of dependents in family excluding themselves (spouse, children etc.) integer
Importing Libraries
To begin, let's install and import useful libraries.
!pip install missingno plotly opendatasets scikit-learn xgboost lightgbm --upgrade --quiet
|████████████████████████████████| 15.3 MB 5.1 MB/s
|████████████████████████████████| 255.9 MB 35 kB/s
|████████████████████████████████| 2.0 MB 39.6 MB/s
# Importing libraries
import opendatasets as od
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import missingno as msno
import matplotlib
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
import warnings
warnings.filterwarnings('ignore')
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier