Hidden State Activation : Ungraded Lecture Notebook

In this notebook you'll take another look at the hidden state activation function. It can be written in two different ways.

I'll show you, step by step, how to implement each of them and then how to verify whether the results produced by each of them are same or not.

Background

vanilla rnn

This is the hidden state activation function for a vanilla RNN.

$h^{<t>}=g(W_{h}[h^{<t-1>},x^{<t>}] + b_h)$

Which is another way of writing this:

$h^{<t>}=g(W_{hh}h^{<t-1>} \oplus W_{hx}x^{<t>} + b_h)$

Where

$W_{h}$ in the first formula is denotes the horizontal concatenation of $W_{hh}$ and $W_{hx}$ from the second formula.
$W_{h}$ in the first formula is then multiplied by $[h^{<t-1>},x^{<t>}]$ , another concatenation of parameters from the second formula but this time in a different direction, i.e vertical!

Let us see what this means computationally.

Imports

import numpy as np

Joining (Concatenation)

Weights

A join along the vertical boundary is called a horizontal concatenation or horizontal stack.

Visually, it looks like this:- $W_h = \left [ W_{hh} \ | \ W_{hx} \right ]$

I'll show you two different ways to achieve this using numpy.

Note: The values used to populate the arrays, below, have been chosen to aid in visual illustration only. They are NOT what you'd expect to use building a model, which would typically be random variables instead.

Try using random initializations for the weight arrays.

# Create some dummy data

w_hh = np.full((3, 2), 1)  # illustration purposes only, returns an array of size 3x2 filled with all 1s
w_hx = np.full((3, 3), 9)  # illustration purposes only, returns an array of size 3x3 filled with all 9s


### START CODE HERE ###
# Try using some random initializations, though it will obfuscate the join. eg: uncomment these lines
# w_hh = np.random.standard_normal((3,2))
# w_hx = np.random.standard_normal((3,3))
### END CODE HERE ###

print("-- Data --\n")
print("w_hh :")
print(w_hh)
print("w_hh shape :", w_hh.shape, "\n")
print("w_hx :")
print(w_hx)
print("w_hx shape :", w_hx.shape, "\n")

# Joining the arrays
print("-- Joining --\n")
# Option 1: concatenate - horizontal
w_h1 = np.concatenate((w_hh, w_hx), axis=1)
print("option 1 : concatenate\n")
print("w_h :")
print(w_h1)
print("w_h shape :", w_h1.shape, "\n")

# Option 2: hstack
w_h2 = np.hstack((w_hh, w_hx))
print("option 2 : hstack\n")
print("w_h :")
print(w_h2)
print("w_h shape :", w_h2.shape)

-- Data --

w_hh :
[[1 1]
 [1 1]
 [1 1]]
w_hh shape : (3, 2) 

w_hx :
[[9 9 9]
 [9 9 9]
 [9 9 9]]
w_hx shape : (3, 3) 

-- Joining --

option 1 : concatenate

w_h :
[[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]
w_h shape : (3, 5) 

option 2 : hstack

w_h :
[[1 1 9 9 9]
 [1 1 9 9 9]
 [1 1 9 9 9]]
w_h shape : (3, 5)

Hidden State & Inputs

Joining along a horizontal boundary is called a vertical concatenation or vertical stack. Visually it looks like this:

$[h^{<t-1>},x^{<t>}] = \left[ \frac{h^{<t-1>}}{x^{<t>}} \right]$

I'll show you two different ways to achieve this using numpy.

Try using random initializations for the hiddent state and input matrices.