Scraping IMDb for Top 250 TV Shows using Python

Web Scraping is process of collecting information from a website in an automated manner using code. This information is collected and then exported into a format that is more useful for the user. I will be using the libraries Requests and BeautifulSoup in Python to scrape data from the website.

IMDb is the world's most popular website that contains information about TV Shows, movies, video games, and streaming content online – including cast, production crew, plot summaries, trivia, ratings, fan and critical reviews and a lot more. It is designed to help fans explore the world of movies and shows and decide what to watch.

The page https://www.imdb.com/chart/toptv/ provides a list of the top 250 TV Shows as rated by the IMDb users. In this project I will retrive information from this page using web scraping. This is how the IMDb site looks: alt

Here is the outline of the steps that I am planning to follow:

Import the required libraries and download the website using requests
Parse the HTML source code using BeautifulSoup
Extract the top 250 TV Shows rank, title, release year, IMDb ratings, genre, runtime, and plot summary from the website
Complie extracted information into Python dictionaries and Pandas DataFrame
Save the extracted information to a CSV file.

By the end of the project I will create a CSV file in the following format:

Rank, Title, Release Year, IMDb Ratings, Genre, Runtime, Plot Summary
1, Planet Earth II, 2016, 9.5, Documentary, 4h 58min, Wildlife documentary series with David Attenborough, beginning with a look at the remote islands which offer sanctuary to some of the planet's rarest creatures, to the beauty of cities, which are home to humans, and animals..
2, Planet Earth, 2006, 9.4, Documentary, 8h 58min, Emmy Award-winning, 11 episodes, five years in the making, the most expensive nature documentary series ever commissioned by the BBC, and the first to be filmed in high definition.	
....

Download the webpage using `requests`

These are the libraries that I will be using:

Requests Library to use built-in methods for making HTTP requests
BeautifulSoup Library to pull data out of HTML File
Pandas Library for Data Manipulation

The libraries can be installed using pip.

# Installing the libraries
!pip install jovian requests beautifulsoup4 pandas --upgrade --quiet

# Importing the libraries
import jovian 
import requests 
from bs4 import BeautifulSoup
import pandas as pd

Scraping IMDb for Top 250 TV Shows using Python

Download the webpage using requests

Download the webpage using `requests`