Web Scraping Project
Web Scraping Popular Movies using BeautifulSoup
A web scraping tutorial in Python for beginners.
The Project Idea is to curate a list of popular movies that I can watch using Web Scraping. Check out the TMdb website here: https://www.themoviedb.org/movie
Web Scraping is the process of gathering useful information from the web and making meaningful insights from it. In a way, web scarping is automating the process of data collection.
Note: Web Scraping code depends on the structure of the web page. So, if the structure changes then your code needs update too!
Python offers a variety of libraries to scrape the web such as BeautifulSoup, Requests, Scrapy, Selenium. If you are starting with web scraping, then Beautiful Soup will be the easy option.
We’ll be using the packages:
- Requests — for downloading the HTML code from the TMdb URL
- BeautifulSoup4 — for extracting data from the HTML string
- Pandas — to gather my data into a dataframe for further processing
Let's see an outline of the steps we'll follow:
- Load the TMdb movie web page https://www.themoviedb.org/movie using
Requests
. - Parse the HTML web page using BeautifulSoup.
- Extract the list of movies from the landing page. For each page, we'll get the movie name, user rating and the movie page URL.
- Again for each movie, we'll grab the release dates, genres, duration and directors.
- Compile extracted movie details into Python Lists and Dictionaries.
- We'll extend the above logic to scrape multiple pages.
- Finally, we'll save all the movie informations into a csv file.
The csv file will be of the following format.
Name,rating,genre,release_date,runtime director,url
Mortal Kombat,80,"Fantasy,Action, Adventure, Science Fiction, Thriller",04/23/2021,1h 50m,Lewis Tan, https://www.themoviedb.org/movie/460465
Godzilla vs. Kong,82.0,"Science Fiction, Action", 03/31/2021,1h 53m,Alexander Skarsgård, https://www.themoviedb.org/movie/399566
Nobody,85.0,"Action, Thriller, Crime",03/26/2021,1h 32m,Bob Odenkirk,https://www.themoviedb.org/movie/615457
Zack Snyder's Justice League,85.0,"Action, Adventure, Fantasy, Science Fiction",03/18/2021,4h 2m,Ben Affleck,https://www.themoviedb.org/movie/791373
How to Run the code
You can execute the code by clicking the "Run" button or by selecting the "Run on Binder" option.
Installing the Libraries
Let’s start by installing the required packages.