Sign In

Financial Distress Prediction

Financial Distress Prediction



Banks play a crucial role in market economies. They decide who can get finance and on what terms and what can make or break investment decisions. For markets and society to function, individuals and companies need access to credit.

Lenders approve loans based on several factors, including your credit score.

what are credit scores?

Traditional credit scores try to predict how likely you are to repay a loan, and they use historical data about your borrowing behavior to do so. To generate a credit score, a computer program reads through data in your credit reports looking for information like:

  • Whether you have borrowed money in the past, and how long you’ve been borrowing
  • Whether you repaid your loans as agreed
  • Whether you’ve missed payments on your loans in the past
  • How you’re currently using debt, including how much you’re borrowing, and what types of debt you use
  • Whether any public records about you exist, like bankruptcy or legal judgments against you from a creditor
  • Whether you’ve recently applied for loans

Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted.

Project Objective

The aim of this project is to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years. This model will help borrowers to make the best financial decisions. To accomplish this task, we will be using a dataset that contains about 150,000 rows with 12 columns. The target column of our dataset has two variables, 0 and 1;

  • 0 represents - No, the borrower will not experience financial distress in the next two years, and
  • 1 represents - Yes, the borrower will experience financial distress in the next two years

Source: The dataset used for this project was obtained from Kaggle: Link

Project Outline

Here's the outline we will follow to complete this project

  • Downloading the Data
  • Perform exploratory analysis and visualization on the dataset
  • Preprocess and clean the data using the pandas library
  • Set up evaluation metrics
  • Build models
  • Tune hyperparameters of our best performing model
  • Make predictions on new input data

Variable Description

Before we begin let's do a variale name description to better understand the dataset we will be working with.

  • SeriousDlqin2yrs: Person experienced 90 days past due delinquency or worse Y/N
  • RevolvingUtilizationOfUnsecuredLines: Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits percentage
  • age: Age of borrower in years integer
  • NumberOfTime3059DaysPastDueNotWorse: Number of times borrower has been 30-59 days past due but no worse in the last 2 years. integer
  • DebtRatio: Monthly debt payments, alimony,living costs divided by monthy gross income percentage
  • MonthlyIncome: Monthly income real
  • NumberOfOpenCreditLinesAndLoans: Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) integer
  • NumberOfTimes90DaysLate: Number of times borrower has been 90 days or more past due. integer
  • NumberRealEstateLoansOrLines: Number of mortgage and real estate loans including home equity lines of credit integer
  • NumberOfTime60-89DaysPastDueNotWorse: Number of times borrower has been 60-89 days past due but no worse in the last 2 years. integer
  • NumberOfDependents: Number of dependents in family excluding themselves (spouse, children etc.) integer

Importing Libraries

To begin, let's install and import useful libraries.

!pip install missingno plotly opendatasets scikit-learn xgboost lightgbm --upgrade --quiet
|████████████████████████████████| 15.3 MB 5.1 MB/s |████████████████████████████████| 255.9 MB 35 kB/s |████████████████████████████████| 2.0 MB 39.6 MB/s
# Importing libraries

import opendatasets as od
import os
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import as px
import missingno as msno
import matplotlib
%matplotlib inline

matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

import warnings

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import f1_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier
Chiamaka Anuebunwa6 months ago