Learn practical skills, build real-world projects, and advance your career
Created 4 years ago
Assignment 1: Logistic Regression
Welcome to week one of this specialization. You will learn about logistic regression. Concretely, you will be implementing logistic regression for sentiment analysis on tweets. Given a tweet, you will decide if it has a positive sentiment or a negative one. Specifically you will:
- Learn how to extract features for logistic regression given some text
- Implement logistic regression from scratch
- Apply logistic regression on a natural language processing task
- Test using your logistic regression
- Perform error analysis
We will be using a data set of tweets. Hopefully you will get more than 99% accuracy.
Run the cell below to load in the packages.
Import functions and data
# run this cell to import nltk
import nltk
from os import getcwd
Imported functions
Download the data needed for this assignment. Check out the documentation for the twitter_samples dataset.
- twitter_samples: if you're running this notebook on your local computer, you will need to download it using:
nltk.download('twitter_samples')
- stopwords: if you're running this notebook on your local computer, you will need to download it using:
nltk.download('stopwords')
Import some helper functions that we provided in the utils.py file:
process_tweet()
: cleans the text, tokenizes it into separate words, removes stopwords, and converts words to stems.build_freqs()
: this counts how often a word in the 'corpus' (the entire set of tweets) was associated with a positive label '1' or a negative label '0', then builds thefreqs
dictionary, where each key is a (word,label) tuple, and the value is the count of its frequency within the corpus of tweets.
import numpy as np
import pandas as pd
from nltk.corpus import twitter_samples
from utils import process_tweet, build_freqs