Lending Club Ml Project
Lending Club Default Prediction using Python
Dataset Link: https://www.kaggle.com/datasets/gabrielsantello/lending-club-loan-preprocessed-dataset
We'll train a machine learning model to predict whether someone will default on their loan.
This dataset is taken from a Kaggle dataset. It contains over 55 millions rows of training data. We'll attempt to achieve a respectable score in the competition using just a fraction of the data. Along the way, we'll also look at some practical tips for machine learning. Most of the ideas & techniques covered in this notebook are derived from other public notebooks & blog posts.
To run this notebook, select "Run" > "Run on Colab" and connect your Google Drive account with Jovian. Make sure to use the GPU runtime if you plan on using a GPU.
TIP #1: Create an outline for your notebook & for each section before you start coding
Here's an outline of the project:
- Download the dataset
- Explore & analyze the dataset
- Prepare the dataset for ML training
- Train hardcoded & baseline models
- Make predictions & submit to Kaggle
- Peform feature engineering
- Train & evaluate different models
- Tune hyperparameters for the best models
- Train on a GPU with the entire dataset
- Document & publish the project online
1. Download the Dataset
Steps:
- Install required libraries
- Download data from Kaggle
- View dataset files
- Load training set with Pandas
- Load test set with Pandas
Install Required Libraries
!pip install numpy pandas jovian plotly opendatasets scikit-learn xgboost --quiet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.6/68.6 kB 1.3 MB/s eta 0:00:00a 0:00:01
Preparing metadata (setup.py) ... done
Building wheel for uuid (setup.py) ... done