Assignment 2: Deep N-grams

Welcome to the second assignment of course 3. In this assignment you will explore Recurrent Neural Networks RNN.

You will be using the fundamentals of google's trax package to implement any kind of deeplearning model.

By completing this assignment, you will learn how to implement models from scratch:

How to convert a line of text into a tensor
Create an iterator to feed data to the model
Define a GRU model using trax
Train the model using trax
Compute the accuracy of your model using the perplexity
Predict using your own model

Outline

Overview
Part 1: Importing the Data
Part 2: Defining the GRU model
- Exercise 03
Part 3: Training
- 3.1 Training the Model
  - Exercise 04
Part 4: Evaluation
- 4.1 Evaluating using the deep nets
  - Exercise 05
Part 5: Generating the language with your own model
Summary

Overview

Your task will be to predict the next set of characters using the previous characters.

Although this task sounds simple, it is pretty useful.
You will start by converting a line of text into a tensor
Then you will create a generator to feed data into the model
You will train a neural network in order to predict the new set of characters of defined length.
You will use embeddings for each character and feed them as inputs to your model.
- Many natural language tasks rely on using embeddings for predictions.
Your model will convert each character to its embedding, run the embeddings through a Gated Recurrent Unit GRU, and run it through a linear layer to predict the next set of characters.

The figure above gives you a summary of what you are about to implement.

You will get the embeddings;
Stack the embeddings on top of each other;
Run them through two layers with a relu activation in the middle;
Finally, you will compute the softmax.

To predict the next character:

Use the softmax output and identify the word with the highest probability.
The word with the highest probability is the prediction for the next word.

import os
import trax
import trax.fastmath.numpy as np
import pickle
import numpy
import random as rnd
from trax import fastmath
from trax import layers as tl

# set random seed
trax.supervised.trainer_lib.init_random_number_generators(32)
rnd.seed(32)

INFO:tensorflow:tokens_length=568 inputs_length=512 targets_length=114 noise_density=0.15 mean_noise_span_length=3.0

Part 1: Importing the Data

1.1 Loading in the data

Now import the dataset and do some processing.

The dataset has one sentence per line.
You will be doing character generation, so you have to process each sentence by converting each character (and not word) to a number.
You will use the ord function to convert a unique character to a unique integer ID.
Store each line in a list.
Create a data generator that takes in the batch_size and the max_length.
- The max_length corresponds to the maximum length of the sentence.