Learn practical skills, build real-world projects, and advance your career
Created 4 years ago
NLP Course 2 Week 1 Lesson : Building The Model - Lecture Exercise 01
Estimated Time: 10 minutes
Vocabulary Creation
Create a tiny vocabulary from a tiny corpus
It's time to start small !
Imports and Data
# imports
import re # regular expression library; for tokenization of words
from collections import Counter # collections library; counter: dict subclass for counting hashable objects
import matplotlib.pyplot as plt # for data visualization
# the tiny corpus of text !
text = 'red pink pink blue blue yellow ORANGE BLUE BLUE PINK' # 🌈
print(text)
print('string length : ',len(text))
red pink pink blue blue yellow ORANGE BLUE BLUE PINK
string length : 52
Preprocessing
# convert all letters to lower case
text_lowercase = text.lower()
print(text_lowercase)
print('string length : ',len(text_lowercase))
red pink pink blue blue yellow orange blue blue pink
string length : 52