Learn practical skills, build real-world projects, and advance your career

Assignment 4: Word Embeddings

Welcome to the fourth (and last) programming assignment of Course 2!

In this assignment, you will practice how to compute word embeddings and use them for sentiment analysis.

  • To implement sentiment analysis, you can go beyond counting the number of positive words and negative words.
  • You can find a way to represent each word numerically, by a vector.
  • The vector could then represent syntactic (i.e. parts of speech) and semantic (i.e. meaning) structures.

In this assignment, you will explore a classic way of generating word embeddings or representations.

  • You will implement a famous model called the continuous bag of words (CBOW) model.

By completing this assignment you will:

  • Train word vectors from scratch.
  • Learn how to create batches of data.
  • Understand how backpropagation works.
  • Plot and visualize your learned word vectors.

Knowing how to train these models will give you a better understanding of word vectors, which are building blocks to many applications in natural language processing.

1. The Continuous bag of words model

Let's take a look at the following sentence:

'I am happy because I am learning'.

  • In continuous bag of words (CBOW) modeling, we try to predict the center word given a few context words (the words around the center word).
  • For example, if you were to choose a context half-size of say C=2C = 2, then you would try to predict the word happy given the context that includes 2 words before and 2 words after the center word:

CC words before: [I, am]

CC words after: [because, I]

  • In other words:

context=[I,am,because,I]context = [I,am, because, I]
target=happytarget = happy

The structure of your model will look like this:

alternate text Figure 1

Where xˉ\bar x is the average of all the one hot vectors of the context words.

alternate text Figure 2

Once you have encoded all the context words, you can use xˉ\bar x as the input to your model.

The architecture you will be implementing is as follows:

\begin{align}
h &= W_1 \ X + b_1 \tag{1} \
a &= ReLU(h) \tag{2} \
z &= W_2 \ a + b_2 \tag{3} \
\hat y &= softmax(z) \tag{4} \
\end{align}

# Import Python libraries and helper functions (in utils2) 
import nltk
from nltk.tokenize import word_tokenize
import numpy as np
from collections import Counter
from utils2 import sigmoid, get_batches, compute_pca, get_dict
# Download sentence tokenizer
nltk.data.path.append('.')#adds download directory to nltk path