Sign In

Tabular Series Sep Final

[jovian] Detected Colab notebook... [jovian] Please enter your API key ( from ): API KEY: ·········· [jovian] Uploading colab notebook to Jovian... Committed successfully!

Predicting the Insurance Claim Possibility


Use the "Run" button to execute the code.

About the dataset

The dataset used here is synthetic, but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting whether a claim will be made on an insurance policy. Although the features are anonymized, they have properties relating to real-world features.

In this notebook, we will try to predict whether a customer made a claim upon an insurance policy. With the help of Pandas, Numpy, Scipy, Matplotlib, Seaborn libraries we will analyze and vizualize the data to gather insights. We then train the Decision Tree, CatBoost and LightGBM models to predict the probability of whether the claim is made or not


  1. Install and import the required libraries
  2. Analyze and clean the dataset using Pandas and Numpy Libraries
  3. Exploratory data analysis and visualization
  4. Data Preparation(Selection, Imputing, Scaling and Encoding)
  5. Training Hardoded and baseline models
  6. Feature Engineering
  7. Training & Evaluating Different Models
  8. Hyperparameter tuning for select models
  9. Employing the Voting Ensemble technique with CatBoost and lightGBM models
  10. Saving the Model parameters
  11. Making Predictions

Installing and importing the required libraries

!pip install jovian opendatasets catboost -q --upgrade
# OS and Data libraries
import os
import sys
import time
import opendatasets as od

#Data Analysis Libraries
import pandas as pd
import numpy as np
from scipy import stats

# Visualization Libraries
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import plotly
import as px

# Data Preprocessing Libraries  
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import KFold, StratifiedKFold

# Importing ML Models
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from catboost import CatBoostClassifier
import lightgbm as ltb

# Cell display settings
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_rows', 200)

from warnings import filterwarnings
Pavan Sai6 months ago