Updated 4 years ago
Swedish Auto Insurance Dataset
1. Introduction
The Swedish Auto Insurance Dataset involves predicting the total payment for all claims in thousands of Swedish Kronor, given the total number of claims.
It is a regression problem. It is comprised of 63 observations with 1 input variable and one output variable. The variable names are as follows:
- Number of claims.
- Total payment for all claims in thousands of Swedish Kronor.
## Loading the dataset from github repo
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
url='https://raw.githubusercontent.com/hargurjeet/MachineLearning/Swedish-Auto-Insurance-Dataset/insurance.csv'
df_raw=pd.read_csv(url,sep='delimiter', header=None, engine='python')
## Dropping intial junk values,renaming the column and resetting the index values
df = df_raw.drop([0, 1, 2, 3], axis=0).reset_index(drop=True).rename(columns={0:'No_Of_Claims'})
df = df.No_Of_Claims.str.split(',',expand=True).rename(columns={0:'No_Of_Claims', 1:'Total_Payment'})
df.head()
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 63 entries, 0 to 62
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 No_Of_Claims 63 non-null object
1 Total_Payment 63 non-null object
dtypes: object(2)
memory usage: 1.1+ KB