Neural Machine Translation

Welcome to your first programming assignment for this week!

You will build a Neural Machine Translation (NMT) model to translate human-readable dates ("25th of June, 2009") into machine-readable dates ("2009-06-25").
You will do this using an attention model, one of the most sophisticated sequence-to-sequence models.

This notebook was produced together with NVIDIA's Deep Learning Institute.

Updates

If you were working on the notebook before this update...

The current notebook is version "4a".
You can find your original work saved in the notebook with the previous version name ("v4")
To view the file directory, go to the menu "File->Open", and this will open a new tab that shows the file directory.

List of updates

Clarified names of variables to be consistent with the lectures and consistent within the assignment
- pre-attention bi-directional LSTM: the first LSTM that processes the input data.
  - 'a': the hidden state of the pre-attention LSTM.
- post-attention LSTM: the LSTM that outputs the translation.
  - 's': the hidden state of the post-attention LSTM.
- energies "e". The output of the dense function that takes "a" and "s" as inputs.
- All references to "output activation" are updated to "hidden state".
- "post-activation" sequence model is updated to "post-attention sequence model".
- 3.1: "Getting the activations from the Network" renamed to "Getting the attention weights from the network."
- Appropriate mentions of "activation" replaced "attention weights."
- Sequence of alphas corrected to be a sequence of "a" hidden states.
one_step_attention:
- Provides sample code for each Keras layer, to show how to call the functions.
- Reminds students to provide the list of hidden states in a specific order, in order to pause the autograder.
model
- Provides sample code for each Keras layer, to show how to call the functions.
- Added a troubleshooting note about handling errors.
- Fixed typo: outputs should be of length 10 and not 11.
define optimizer and compile model
- Provides sample code for each Keras layer, to show how to call the functions.
Spelling, grammar and wording corrections.

Let's load all the packages you will need for this assignment.

from keras.layers import Bidirectional, Concatenate, Permute, Dot, Input, LSTM, Multiply
from keras.layers import RepeatVector, Dense, Activation, Lambda
from keras.optimizers import Adam
from keras.utils import to_categorical
from keras.models import load_model, Model
import keras.backend as K
import numpy as np

from faker import Faker
import random
from tqdm import tqdm
from babel.dates import format_date
from nmt_utils import *
import matplotlib.pyplot as plt
%matplotlib inline

Using TensorFlow backend.

1 - Translating human readable dates into machine readable dates

The model you will build here could be used to translate from one language to another, such as translating from English to Hindi.
However, language translation requires massive datasets and usually takes days of training on GPUs.
To give you a place to experiment with these models without using massive datasets, we will perform a simpler "date translation" task.
The network will input a date written in a variety of possible formats (e.g. "the 29th of August 1958", "03/30/1968", "24 JUNE 1987")
The network will translate them into standardized, machine readable dates (e.g. "1958-08-29", "1968-03-30", "1987-06-24").
We will have the network learn to output dates in the common machine-readable format YYYY-MM-DD.