Learn practical skills, build real-world projects, and advance your career

Insurance cost prediction using linear regression

In this assignment we're going to use information like a person's age, sex, BMI, no. of children and smoking habit to predict the price of yearly medical bills. This kind of model is useful for insurance companies to determine the yearly insurance premium for a person. The dataset for this problem is taken from: https://www.kaggle.com/mirichoi0218/insurance

We will create a model with the following steps:

  1. Download and explore the dataset
  2. Prepare the dataset for training
  3. Create a linear regression model
  4. Train the model to fit the data
  5. Make predictions using the trained model

This assignment builds upon the concepts from the first 2 lectures. It will help to review these Jupyter notebooks:

As you go through this notebook, you will find a ??? in certain places. Your job is to replace the ??? with appropriate code or values, to ensure that the notebook runs properly end-to-end . In some cases, you'll be required to choose some hyperparameters (learning rate, batch size etc.). Try to experiment with the hypeparameters to get the lowest loss.

# Uncomment and run the commands below if imports fail
!conda install numpy pytorch torchvision cpuonly -c pytorch -y
!pip install matplotlib --upgrade --quiet
!pip install --upgrade pip
!pip install jovian --upgrade --quiet
!pip install pandas --upgrade --quiet
!pip install seaborn --upgrade --quiet
Collecting package metadata (current_repodata.json): done Solving environment: done ## Package Plan ## environment location: /opt/conda added / updated specs: - cpuonly - numpy - pytorch - torchvision The following packages will be downloaded: package | build ---------------------------|----------------- ca-certificates-2020.6.20 | hecda079_0 145 KB conda-forge certifi-2020.6.20 | py37hc8dfbb8_0 151 KB conda-forge numpy-1.18.5 | py37h8960a57_0 5.1 MB conda-forge ------------------------------------------------------------ Total: 5.4 MB The following packages will be UPDATED: ca-certificates 2020.4.5.2-hecda079_0 --> 2020.6.20-hecda079_0 certifi 2020.4.5.2-py37hc8dfbb8_0 --> 2020.6.20-py37hc8dfbb8_0 numpy 1.18.1-py37h8960a57_1 --> 1.18.5-py37h8960a57_0 Downloading and Extracting Packages certifi-2020.6.20 | 151 KB | ##################################### | 100% numpy-1.18.5 | 5.1 MB | ##################################### | 100% ca-certificates-2020 | 145 KB | ##################################### | 100% Preparing transaction: done Verifying transaction: done Executing transaction: done ERROR: osmnx 0.14.1 has requirement geopandas>=0.7, but you'll have geopandas 0.6.3 which is incompatible. ERROR: hypertools 0.6.2 has requirement scikit-learn<0.22,>=0.19.1, but you'll have scikit-learn 0.23.1 which is incompatible. Requirement already up-to-date: pip in /opt/conda/lib/python3.7/site-packages (20.1.1) ERROR: osmnx 0.14.1 has requirement geopandas>=0.7, but you'll have geopandas 0.6.3 which is incompatible. ERROR: hypertools 0.6.2 has requirement scikit-learn<0.22,>=0.19.1, but you'll have scikit-learn 0.23.1 which is incompatible. ERROR: hypertools 0.6.2 has requirement scikit-learn<0.22,>=0.19.1, but you'll have scikit-learn 0.23.1 which is incompatible.
import torch
import jovian
import torchvision
import torch.nn as nn
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import torch.nn.functional as F
from torchvision.datasets.utils import download_url
from torch.utils.data import DataLoader, TensorDataset, random_split
project_name='02-insurance-linear-regression' # will be used by jovian.commit

Step 1: Download and explore the data

Let us begin by downloading the data. We'll use the download_url function from PyTorch to get the data as a CSV (comma-separated values) file.