Tabular Series Sep Final
jovian.commit()
[jovian] Detected Colab notebook...
[jovian] Please enter your API key ( from https://jovian.ai/ ):
API KEY: ··········
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/pavansb/tabular-series-sep-final
Predicting the Insurance Claim Possibility
Use the "Run" button to execute the code.
About the dataset
The dataset used here is synthetic, but based on a real dataset and generated using a CTGAN. The original dataset deals with predicting whether a claim will be made on an insurance policy. Although the features are anonymized, they have properties relating to real-world features.
In this notebook, we will try to predict whether a customer made a claim upon an insurance policy. With the help of Pandas, Numpy, Scipy, Matplotlib, Seaborn libraries we will analyze and vizualize the data to gather insights. We then train the Decision Tree, CatBoost and LightGBM models to predict the probability of whether the claim is made or not
Outline
- Install and import the required libraries
- Analyze and clean the dataset using Pandas and Numpy Libraries
- Exploratory data analysis and visualization
- Data Preparation(Selection, Imputing, Scaling and Encoding)
- Training Hardoded and baseline models
- Feature Engineering
- Training & Evaluating Different Models
- Hyperparameter tuning for select models
- Employing the Voting Ensemble technique with CatBoost and lightGBM models
- Saving the Model parameters
- Making Predictions
Installing and importing the required libraries
!pip install jovian opendatasets catboost -q --upgrade
# OS and Data libraries
import os
import sys
import time
import opendatasets as od
#Data Analysis Libraries
import pandas as pd
import numpy as np
from scipy import stats
# Visualization Libraries
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import folium
import plotly
import plotly.express as px
# Data Preprocessing Libraries
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split, KFold, StratifiedKFold
from sklearn.metrics import accuracy_score, mean_squared_error, roc_auc_score
from sklearn.model_selection import KFold, StratifiedKFold
# Importing ML Models
from sklearn.ensemble import RandomForestClassifier, VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from catboost import CatBoostClassifier
import lightgbm as ltb
# Cell display settings
pd.set_option('display.max_columns', 200)
pd.set_option('display.max_rows', 200)
from warnings import filterwarnings
filterwarnings('ignore')
Pavan Sai6 months ago