Exercise 5 - Logistic Regression
Logistic regression predicts binary (yes/no) events. For example, we may want to predict if someone will arrive at work on time, or if a person shopping will buy a product.
This exercise will demonstrate simple logistic regression: predicting an outcome from only one feature.
Step 1
We want to place a bet on the outcome of the next football (soccer) match. It is the final of a competition, so there will not be a draw. We have historical data about our favourite team playing in matches such as this. Complete the exercise below to preview our data.
In the cell below replace:
1. <addFilePath>
with 'Data/football data.txt'
(including the quotation marks)
2. <printDataHere>
with print(dataset.head())
and then run the code.
# This part sets up the graphing configuration
import warnings
warnings.filterwarnings("ignore")
import matplotlib.pyplot as graph
%matplotlib inline
graph.rcParams['figure.figsize'] = (15,5)
graph.rcParams["font.family"] = 'DejaVu Sans'
graph.rcParams["font.size"] = '12'
graph.rcParams['image.cmap'] = 'rainbow'
import pandas as pd
###
# REPLACE <addFilePath> BELOW WITH 'Data/football data.txt' (INCLUDING THE QUOTES) TO LOAD THE DATA FROM THAT FILE
###
dataset = pd.read_csv('Data/football data.txt', index_col = False, sep = '\t', header = 0)
###
###
# REPLACE <printDataHere> BELOW WITH print(dataset.head()) TO PREVIEW OUR DATASET
###
print(dataset.head())
###
average_goals_per_match won_competition
0 2.422870 1
1 2.824478 1
2 0.571688 0
3 1.055028 0
4 0.394192 0
This data shows the average goals per match of our team for that season in the left column. In the right column it lists a 1 if our team won the competition or a 0 if they did not.
Step 2
Let's graph the data so we have a better idea of what's going on here. Complete the exercise below to make an x-y scatter plot.
In the cell below replace:
1. <addWonCompetition>
with 'won_competition'
2. <addAverageGoals>
with 'average_goals_per_match'
then run the code.
###
# REPLACE <addWonCompetition> BELOW WITH 'won_competition' (INCLUDING THE QUOTES)
###
train_Y = dataset['won_competition']
###
###
# REPLACE <addAverageGoals> BELOW WITH 'average_goals_per_match' (INCLUDING THE QUOTES)
###
train_X = dataset['average_goals_per_match']
###
# The 'won_competition' will be displayed on the vertical axis (y axis)
# The 'average_goals_per_match' will be displayed on the horizontal axis (x axis)
graph.scatter(train_X, train_Y, c = train_Y, marker = 'D')
graph.yticks([0, 1], ['No', 'Yes'])
graph.ylabel("Competition Win")
graph.ylim([-0.5, 1.5])
graph.xlabel("Average number of goals scored per match")
graph.show()
We can see from this graph that generally, when our team has a good score average, they tend to win the competition.
Step 3
How can we predict whether the team will win this season? Let's apply AI to this problem, by making a logisitic regression model using this data and then graph it. This will tell us whether we will likely win this season.