Learn practical skills, build real-world projects, and advance your career

Exercise 2 - Simple Linear Regression

We want to know how to make our chocolate-bar customers happier. To do this, we need to know which chocolate bar features predict customer happiness. For example, customers may be happier when chocolate bars are bigger, or when they contain more cocoa.

We have data on customer happiness when eating chocolate bars with different features. Lets look at the relationship between happiness and bar size.

Step 1

First, lets have a look at our data.

In the cell below replace the text <printDataHere> with print(dataset.head()) and then press Run in the toolbar above (or press Shift+Enter).

import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import matplotlib.pyplot as graph
import statsmodels.formula.api as smf
from scipy import stats

dataset = pd.read_csv('Data/chocolate data.txt', index_col=False, sep="\t",header=0)
    
print(dataset.head())
weight cocoa_percent sugar_percent milk_percent customer_happiness 0 185 65 11 24 47 1 247 44 34 22 55 2 133 33 21 47 35 3 145 30 38 32 34 4 110 22 70 7 40

The data represents 100 different variations of chocolate bars and the measured customer happiness for each one.

Step 2

We want to know which chocolate bar features make customers happy.

The example below shows a linear regression between cocoa percentage and happiness. You can read through the comments to understand what is happening.

Run the code to to see the output visualized.
# Run this cell!

# DO NOT EDIT ANY OF THIS CODE

# Define a function to perform a linear regression
def PerformLinearRegression(formula):

    # This performs linear regression
    lm = smf.ols(formula = formula, data = dataset).fit()

    featureName=formula.split(" ")[-1]
    
    # get the data for the x parameter (our feature)
    train_X=dataset[featureName]
    
    # This makes and shows a graph
    intercept=lm.params[0]
    slope=lm.params[1]
    line = slope * train_X + intercept
    graph.plot(train_X, line, '-', c = 'red')
    graph.scatter(train_X, dataset.customer_happiness)
    graph.ylabel('customer_happiness')
    graph.xlabel(featureName)
    graph.show()

# This performs the linear regression steps listed above
# The text in red is the formula for our regression
PerformLinearRegression('customer_happiness ~ cocoa_percent')
Notebook Image