Learn practical skills, build real-world projects, and advance your career
Created 4 years ago
Visualizing tweets and the Logistic Regression model
Objectives: Visualize and interpret the logistic regression model
Steps:
- Plot tweets in a scatter plot using their positive and negative sums.
- Plot the output of the logistic regression model in the same plot as a solid line
Import the required libraries
We will be using NLTK, an opensource NLP library, for collecting, handling, and processing Twitter data. In this lab, we will use the example dataset that comes alongside with NLTK. This dataset has been manually annotated and serves to establish baselines for models quickly.
So, to start, let's import the required libraries.
import nltk # NLP toolbox
from os import getcwd
import pandas as pd # Library for Dataframes
from nltk.corpus import twitter_samples
import matplotlib.pyplot as plt # Library for visualization
import numpy as np # Library for math functions
from utils import process_tweet, build_freqs # Our functions for NLP
Load the NLTK sample dataset
To complete this lab, you need the sample dataset of the previous lab. Here, we assume the files are already available, and we only need to load into Python lists.
# select the set of positive and negative tweets
all_positive_tweets = twitter_samples.strings('positive_tweets.json')
all_negative_tweets = twitter_samples.strings('negative_tweets.json')
tweets = all_positive_tweets + all_negative_tweets ## Concatenate the lists.
labels = np.append(np.ones((len(all_positive_tweets),1)), np.zeros((len(all_negative_tweets),1)), axis = 0)
# split the data into two pieces, one for training and one for testing (validation set)
train_pos = all_positive_tweets[:4000]
train_neg = all_negative_tweets[:4000]
train_x = train_pos + train_neg
print("Number of tweets: ", len(train_x))
Number of tweets: 8000