Learn practical skills, build real-world projects, and advance your career

Exercise 5 - Logistic Regression

Logistic regression predicts binary (yes/no) events. For example, we may want to predict if someone will arrive at work on time, or if a person shopping will buy a product.

This exercise will demonstrate simple logistic regression: predicting an outcome from only one feature.

Step 1

We want to place a bet on the outcome of the next football (soccer) match. It is the final of a competition, so there will not be a draw. We have historical data about our favourite team playing in matches such as this. Complete the exercise below to preview our data.

In the cell below replace:

1. <addFilePath> with 'Data/football data.txt' (including the quotation marks)

2. <printDataHere> with print(dataset.head())

and then run the code.

# This part sets up the graphing configuration
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as graph
%matplotlib inline
graph.rcParams['figure.figsize'] = (15,5)
graph.rcParams["font.family"] = 'DejaVu Sans'
graph.rcParams["font.size"] = '12'
graph.rcParams['image.cmap'] = 'rainbow'
import pandas as pd


###
# REPLACE <addFilePath> BELOW WITH 'Data/football data.txt' (INCLUDING THE QUOTES) TO LOAD THE DATA FROM THAT FILE
###
dataset = pd.read_csv('Data/football data.txt', index_col = False, sep = '\t', header = 0)
###

###
# REPLACE <printDataHere> BELOW WITH print(dataset.head()) TO PREVIEW OUR DATASET
###
print(dataset.head())
###
average_goals_per_match won_competition 0 2.422870 1 1 2.824478 1 2 0.571688 0 3 1.055028 0 4 0.394192 0

This data shows the average goals per match of our team for that season in the left column. In the right column it lists a 1 if our team won the competition or a 0 if they did not.

Step 2

Let's graph the data so we have a better idea of what's going on here. Complete the exercise below to make an x-y scatter plot.

In the cell below replace:

1. <addWonCompetition> with 'won_competition'
2. <addAverageGoals> with 'average_goals_per_match'
then run the code.
###
# REPLACE <addWonCompetition> BELOW WITH 'won_competition' (INCLUDING THE QUOTES)
###
train_Y = dataset['won_competition']
###

###
# REPLACE <addAverageGoals> BELOW WITH 'average_goals_per_match' (INCLUDING THE QUOTES)
###
train_X = dataset['average_goals_per_match']
###

# The 'won_competition' will be displayed on the vertical axis (y axis)
# The 'average_goals_per_match' will be displayed on the horizontal axis (x axis)

graph.scatter(train_X, train_Y, c = train_Y, marker = 'D')

graph.yticks([0, 1], ['No', 'Yes'])
graph.ylabel("Competition Win")
graph.ylim([-0.5, 1.5])
graph.xlabel("Average number of goals scored per match")

graph.show()
Notebook Image

We can see from this graph that generally, when our team has a good score average, they tend to win the competition.

Step 3

How can we predict whether the team will win this season? Let's apply AI to this problem, by making a logisitic regression model using this data and then graph it. This will tell us whether we will likely win this season.

Below replace <buildLinearRegression> with linear_model.LogisticRegression() and then run the code.