Learn practical skills, build real-world projects, and advance your career

Assignment 1: Logistic Regression

Welcome to week one of this specialization. You will learn about logistic regression. Concretely, you will be implementing logistic regression for sentiment analysis on tweets. Given a tweet, you will decide if it has a positive sentiment or a negative one. Specifically you will:

  • Learn how to extract features for logistic regression given some text
  • Implement logistic regression from scratch
  • Apply logistic regression on a natural language processing task
  • Test using your logistic regression
  • Perform error analysis

We will be using a data set of tweets. Hopefully you will get more than 99% accuracy.
Run the cell below to load in the packages.

Import functions and data

# run this cell to import nltk
import nltk
from os import getcwd

Imported functions

Download the data needed for this assignment. Check out the documentation for the twitter_samples dataset.

  • twitter_samples: if you're running this notebook on your local computer, you will need to download it using:
nltk.download('twitter_samples')
  • stopwords: if you're running this notebook on your local computer, you will need to download it using:
nltk.download('stopwords')
Import some helper functions that we provided in the utils.py file:
  • process_tweet(): cleans the text, tokenizes it into separate words, removes stopwords, and converts words to stems.
  • build_freqs(): this counts how often a word in the 'corpus' (the entire set of tweets) was associated with a positive label '1' or a negative label '0', then builds the freqs dictionary, where each key is a (word,label) tuple, and the value is the count of its frequency within the corpus of tweets.
import numpy as np
import pandas as pd
from nltk.corpus import twitter_samples 

from utils import process_tweet, build_freqs