Applications of Classification

In this lab you will perform two-class classification using logistic regression. A classifier is a machine learning model that separates the label into categories or classes. In other words, classification models are supervised machine learning models which predict a categorical label.

The German Credit bank customer data is used to determine if a particular person is a good or bad credit risk. Thus, credit risk of the customer is the classes you must predict. In this case, the cost to the bank of issuing a loan to a bad risk customer is five times that of denying a loan to a good customer. This fact will become important when evaluating the performance of the model.

Logistic regression is a linear model but with a nonlinear response. The response is binary, $\{ 0,1 \}$ , or positive and negative. The response is the prediction of the category.

In this lab you will learn the following:

How to prepare data for classification models using scikit-learn.
Constructing a classification model using scikit-learn.
Evaluating the performance of the classification model.
Using techniques such as reweighting the labels and changing the decision threshold to change the trade-off between false positive and false negative error rates.

Basics of logistic regression

In this section some basic properties of the logistic regression model are presented.

First, execute the code in the cell below to load the packages required to run this notebook.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import numpy.random as nr
import math
from sklearn import preprocessing
import sklearn.model_selection as ms
from sklearn import linear_model
import sklearn.metrics as sklm

%matplotlib inline

Logistic regression is widely used as a classification model. Logistic regression is linear model, with a binary response, {False, True} or {0, 1}. You can think of this response as having a Binomial distribution. For linear regression the response is just, well, linear. Logistic regression is a linear regression model with a nonlinear output. The response of the linear model is transformed or 'squashed' to values close to 0 and 1 using a sigmoidal function, also known as the logistic function. The result of this transformation is a response which is the log likelihood for each of the two classes.

The sigmoidal or logistic function can be expressed as follows:

\kappa = steepness$$ Execute the code in the cell below to compute and plot an example of the logistic function.

xseq = np.arange(-7, 7, 0.1)

logistic = [math.exp(v)/(1 + math.exp(v)) for v in xseq]

plt.plot(xseq, logistic, color = 'red')
plt.plot([-7,7], [0.5,0.5], color = 'blue')
plt.plot([0,0], [0,1], color = 'blue')
plt.title('Logistic function for two-class classification')
plt.ylabel('log likelihood')
plt.xlabel('Value of output from linear regression')