Swedish Auto Insurance Dataset

Updated 4 years ago

Run on Colab

Run on Kaggle

Run on Binder

Duplicate

1. Introduction

The Swedish Auto Insurance Dataset involves predicting the total payment for all claims in thousands of Swedish Kronor, given the total number of claims.
It is a regression problem. It is comprised of 63 observations with 1 input variable and one output variable. The variable names are as follows:

Number of claims.

Total payment for all claims in thousands of Swedish Kronor.

## Loading the dataset from github repo

import warnings
warnings.filterwarnings("ignore")
import pandas as pd

url='https://raw.githubusercontent.com/hargurjeet/MachineLearning/Swedish-Auto-Insurance-Dataset/insurance.csv'

df_raw=pd.read_csv(url,sep='delimiter', header=None,  engine='python')

## Dropping intial junk values,renaming the column and resetting the index values
df = df_raw.drop([0, 1, 2, 3], axis=0).reset_index(drop=True).rename(columns={0:'No_Of_Claims'})
df = df.No_Of_Claims.str.split(',',expand=True).rename(columns={0:'No_Of_Claims', 1:'Total_Payment'})
df.head()

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63 entries, 0 to 62
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   No_Of_Claims   63 non-null     object
 1   Total_Payment  63 non-null     object
dtypes: object(2)
memory usage: 1.1+ KB