Learn practical skills, build real-world projects, and advance your career

NLP Course 2 Week 1 Lesson : Building The Model - Lecture Exercise 01

Estimated Time: 10 minutes

Vocabulary Creation

Create a tiny vocabulary from a tiny corpus


It's time to start small !

Imports and Data

# imports
import re # regular expression library; for tokenization of words
from collections import Counter # collections library; counter: dict subclass for counting hashable objects
import matplotlib.pyplot as plt # for data visualization
# the tiny corpus of text ! 
text = 'red pink pink blue blue yellow ORANGE BLUE BLUE PINK' # 🌈
print(text)
print('string length : ',len(text))
red pink pink blue blue yellow ORANGE BLUE BLUE PINK string length : 52

Preprocessing

# convert all letters to lower case
text_lowercase = text.lower()
print(text_lowercase)
print('string length : ',len(text_lowercase))
red pink pink blue blue yellow orange blue blue pink string length : 52