Sign In

Lending Club Ml Project

Lending Club Default Prediction using Python


Dataset Link:

We'll train a machine learning model to predict whether someone will default on their loan.

This dataset is taken from a Kaggle dataset. It contains over 55 millions rows of training data. We'll attempt to achieve a respectable score in the competition using just a fraction of the data. Along the way, we'll also look at some practical tips for machine learning. Most of the ideas & techniques covered in this notebook are derived from other public notebooks & blog posts.

To run this notebook, select "Run" > "Run on Colab" and connect your Google Drive account with Jovian. Make sure to use the GPU runtime if you plan on using a GPU.

TIP #1: Create an outline for your notebook & for each section before you start coding

Here's an outline of the project:

  1. Download the dataset
  2. Explore & analyze the dataset
  3. Prepare the dataset for ML training
  4. Train hardcoded & baseline models
  5. Make predictions & submit to Kaggle
  6. Peform feature engineering
  7. Train & evaluate different models
  8. Tune hyperparameters for the best models
  9. Train on a GPU with the entire dataset
  10. Document & publish the project online

1. Download the Dataset


  • Install required libraries
  • Download data from Kaggle
  • View dataset files
  • Load training set with Pandas
  • Load test set with Pandas

Install Required Libraries

!pip install numpy pandas jovian plotly opendatasets scikit-learn xgboost --quiet
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 68.6/68.6 kB 1.3 MB/s eta 0:00:00a 0:00:01 Preparing metadata ( ... done Building wheel for uuid ( ... done
Sam Jeffcoat4 months ago