Vanishing Gradients : Ungraded Lecture Notebook

In this notebook you'll take another look at vanishing gradients, from an intuitive standpoint.

Background

Adding layers to a neural network introduces multiplicative effects in both forward and backward propagation. The back prop in particular presents a problem as the gradient of activation functions can be very small. Multiplied together across many layers, their product can be vanishingly small! This results in weights not being updated in the front layers and training not progressing.

Gradients of the sigmoid function, for example, are in the range 0 to 0.25. To calculate gradients for the front layers of a neural network the chain rule is used. This means that these tiny values are multiplied starting at the last layer, working backwards to the first layer, with the gradients shrinking exponentially at each step.

Imports

import numpy as np
import matplotlib.pyplot as plt

%matplotlib inline

Data, Activation & Gradient

Data

I'll start be creating some data, nothing special going on here. Just some values spread across the interval -5 to 5.

Try changing the range of values in the data to see how it impacts the plots that follow.

Activation

The example here is sigmoid() to squish the data x into the interval 0 to 1.

Gradient

This is the derivative of the sigmoid() activation function. It has a maximum of 0.25 at x = 0, the steepest point on the sigmoid plot.

Try changing the x value for finding the tangent line in the plot.

# Data
# Interval [-5, 5]
### START CODE HERE ###
x = np.linspace(-5, 5, 100)  # try changing the range of values in the data. eg: (-100,100,1000)
### END CODE HERE ###
# Activation
# Interval [0, 1]
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

activations = sigmoid(x)

# Gradient
# Interval [0, 0.25]
def sigmoid_gradient(x):
    return (x) * (1 - x)

gradients = sigmoid_gradient(activations)

# Plot sigmoid with tangent line
plt.plot(x, activations)
plt.title("Sigmoid Steepest Point")
plt.xlabel("x input data")
plt.ylabel("sigmoid(x)")

# Add the tangent line
### START CODE HERE ###
x_tan = 0   # x value to find the tangent. try different values within x declared above. eg: 2  
### END CODE HERE ###
y_tan = sigmoid(x_tan)  # y value
span = 1.7              # line span along x axis
data_tan = np.linspace(x_tan - span, x_tan + span)  # x values to plot
gradient_tan = sigmoid_gradient(sigmoid(x_tan))     # gradient of the tangent
tan = y_tan + gradient_tan * (data_tan - x_tan)     # y values to plot
plt.plot(x_tan, y_tan, marker="o", color="orange", label=True)  # marker
plt.plot(data_tan, tan, linestyle="--", color="orange")         # line
plt.show()

Plots

Sub Plots

Data values along the x-axis of the plots on the interval chosen for x, -5 to 5. Subplots:

x vs x
sigmoid of x
gradient of sigmoid

Notice how the y axis keeps compressing from the left plot to the right plot. The interval range has shrunk from 10 to 1 to 0.25. How did this happen? As |x| gets larger the sigmoid approaches asymptotes at 0 and 1, and the sigmoid gradient shrinks towards 0.

Try changing the range of values in the code block above to see how it impacts the plots.