Learn practical skills, build real-world projects, and advance your career

Investigating Fandango Movie Ratings.

In October 2015, a data journalist named Walt Hickey analyzed movie ratings data and found strong evidence to suggest that Fandango's rating system was biased and dishonest (Fandango is an online movie ratings aggregator). He published his analysis in this article — a great piece of data journalism that's totally worth reading.

Fandango displays a 5-star rating system on their website, where the minimum rating is 0 stars and the maximum is 5 stars. Hickey found that there's a significant discrepancy between the number of stars displayed to users and the actual rating, which he was able to find in the HTML of the page. He was able to find that:

  • The actual rating was almost always rounded up to the nearest half-star. For instance, a 4.1 movie would be rounded off to 4.5 stars, not to 4 stars, as you may expect.
  • In the case of 8% of the ratings analyzed, the rounding up was done to the nearest whole star. For instance, a 4.5 rating would be rounded off to 5 stars.
  • For one movie rating, the rounding off was completely bizarre: from a rating of 4 in the HTML of the page to a displayed rating of 5 stars.

In this analysis, we want to find out if there has been a change in the Fandango rating system by comparing the data used in the article by Walt Hickey(2015) and data from the Fandango website after(2016-2017).

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline


fandango_1 = pd.read_csv('fandango_score_comparison.csv') # the walt hicky data
fandango_2 = pd.read_csv('movie_ratings_16_17.csv') # the 2016-2017 data
fandango_1.head()
fandango_2.head()
# creating copies of the data
# so we do not lose the original dataframe

fandango_prev = fandango_1.copy() 
fandango_curr = fandango_2.copy()