Learn practical skills, build real-world projects, and advance your career

Manipulating word embeddings

In this week's assignment, you are going to use a pre-trained word embedding for finding word analogies and equivalence. This exercise can be used as an Intrinsic Evaluation for the word embedding performance. In this notebook, you will apply linear algebra operations using NumPy to find analogies between words manually. This will help you to prepare for this week's assignment.

import pandas as pd # Library for Dataframes 
import numpy as np # Library for math functions
import pickle # Python object serialization library. Not secure

word_embeddings = pickle.load( open( "word_embeddings_subset.p", "rb" ) )
len(word_embeddings) # there should be 243 words that will be used in this assignment
243

Now that the model is loaded, we can take a look at the word representations. First, note that the word_embeddings is a dictionary. Each word is the key to the entry, and the value is its corresponding vector presentation. Remember that square brackets allow access to any entry if the key exists.

countryVector = word_embeddings['country'] # Get the vector representation for the word 'country'
print(type(countryVector)) # Print the type of the vector. Note it is a numpy array
print(countryVector) # Print the values of the vector.  
<class 'numpy.ndarray'> [-0.08007812 0.13378906 0.14355469 0.09472656 -0.04736328 -0.02355957 -0.00854492 -0.18652344 0.04589844 -0.08154297 -0.03442383 -0.11621094 0.21777344 -0.10351562 -0.06689453 0.15332031 -0.19335938 0.26367188 -0.13671875 -0.05566406 0.07470703 -0.00070953 0.09375 -0.14453125 0.04296875 -0.01916504 -0.22558594 -0.12695312 -0.0168457 0.05224609 0.0625 -0.1484375 -0.01965332 0.17578125 0.10644531 -0.04760742 -0.10253906 -0.28515625 0.10351562 0.20800781 -0.07617188 -0.04345703 0.08642578 0.08740234 0.11767578 0.20996094 -0.07275391 0.1640625 -0.01135254 0.0025177 0.05810547 -0.03222656 0.06884766 0.046875 0.10107422 0.02148438 -0.16210938 0.07128906 -0.16210938 0.05981445 0.05102539 -0.05566406 0.06787109 -0.03759766 0.04345703 -0.03173828 -0.03417969 -0.01116943 0.06201172 -0.08007812 -0.14941406 0.11914062 0.02575684 0.00302124 0.04711914 -0.17773438 0.04101562 0.05541992 0.00598145 0.03027344 -0.07666016 -0.109375 0.02832031 -0.10498047 0.0100708 -0.03149414 -0.22363281 -0.03125 -0.01147461 0.17285156 0.08056641 -0.10888672 -0.09570312 -0.21777344 -0.07910156 -0.10009766 0.06396484 -0.11962891 0.18652344 -0.02062988 -0.02172852 0.29296875 -0.00793457 0.0324707 -0.15136719 0.00227356 -0.03540039 -0.13378906 0.0546875 -0.03271484 -0.01855469 -0.10302734 -0.13378906 0.11425781 0.16699219 0.01361084 -0.02722168 -0.2109375 0.07177734 0.08691406 -0.09960938 0.01422119 -0.18261719 0.00741577 0.01965332 0.00738525 -0.03271484 -0.15234375 -0.26367188 -0.14746094 0.03320312 -0.03344727 -0.01000977 0.01855469 0.00183868 -0.10498047 0.09667969 0.07910156 0.11181641 0.13085938 -0.08740234 -0.1328125 0.05004883 0.19824219 0.0612793 0.16210938 0.06933594 0.01281738 0.01550293 0.01531982 0.11474609 0.02758789 0.13769531 -0.08349609 0.01123047 -0.20507812 -0.12988281 -0.16699219 0.20410156 -0.03588867 -0.10888672 0.0534668 0.15820312 -0.20410156 0.14648438 -0.11572266 0.01855469 -0.13574219 0.24121094 0.12304688 -0.14550781 0.17578125 0.11816406 -0.30859375 0.10888672 -0.22363281 0.19335938 -0.15722656 -0.07666016 -0.09082031 -0.19628906 -0.23144531 -0.09130859 -0.14160156 0.06347656 0.03344727 -0.03369141 0.06591797 0.06201172 0.3046875 0.16796875 -0.11035156 -0.03833008 -0.02563477 -0.09765625 0.04467773 -0.0534668 0.11621094 -0.15039062 -0.16308594 -0.15527344 0.04638672 0.11572266 -0.06640625 -0.04516602 0.02331543 -0.08105469 -0.0255127 -0.07714844 0.0016861 0.15820312 0.00994873 -0.06445312 0.15722656 -0.03112793 0.10644531 -0.140625 0.23535156 -0.11279297 0.16015625 0.00061798 -0.1484375 0.02307129 -0.109375 0.05444336 -0.14160156 0.11621094 0.03710938 0.14746094 -0.04199219 -0.01391602 -0.03881836 0.02783203 0.10205078 0.07470703 0.20898438 -0.04223633 -0.04150391 -0.00588989 -0.14941406 -0.04296875 -0.10107422 -0.06176758 0.09472656 0.22265625 -0.02307129 0.04858398 -0.15527344 -0.02282715 -0.04174805 0.16699219 -0.09423828 0.14453125 0.11132812 0.04223633 -0.16699219 0.10253906 0.16796875 0.12597656 -0.11865234 -0.0213623 -0.08056641 0.24316406 0.15527344 0.16503906 0.00854492 -0.12255859 0.08691406 -0.11914062 -0.02941895 0.08349609 -0.03100586 0.13964844 -0.05151367 0.00765991 -0.04443359 -0.04980469 -0.03222656 -0.00952148 -0.10888672 -0.10302734 -0.15722656 0.19335938 0.04858398 0.015625 -0.08105469 -0.11621094 -0.01989746 0.05737305 0.06103516 -0.14550781 0.06738281 -0.24414062 -0.07714844 0.04760742 -0.07519531 -0.14941406 -0.04418945 0.09716797 0.06738281]

It is important to note that we store each vector as a NumPy array. It allows us to use the linear algebra operations on it.

The vectors have a size of 300, while the vocabulary size of Google News is around 3 million words!