Scrape Popular Tvshows Tmdb
Scraping popular TV-shows on TMDB using python.
- TMDB is a massive index for movie and television information,TMDb is completely powered by its community.TMDB is a site that provides a free API portal (a method) for researchers who are interested in getting access to movie data.
The page https://www.themoviedb.org/tv provides a list of the popular TV shows on TMDB. In this project, we'll retrive information from this page using web scraping: the process of extracting information from a website in an automated fashion using code. We'll use the Python libraries Requests and Beautiful Soup to scrape data from this page.
- After opening the website, we are going to navigate through the Tvshows tab on the top left and click on option popular to get the page of popular Tv shows.
Here are the steps we'll follow:
- We're going to scrape https://www.themoviedb.org/tv
- First step would be to download the webpage using
- Parse the HTML source code using
- We'll check out the page that has the list of TV-shows. For each show, we'll extract title, User Score, show's individual page URL and the premiered date.
- From each individual page URL, we'll extract different kind of information about the show. For each page, we'll grab the Current_season, Current_season_Episodes, Tagline, Genre, and Cast.
- Compile extracted information into python lists and dictionaries.
- Extract and combine data from multiple pages.
- Finally, we are going to save the extracted information to a CSV file.
:- Following is the format for how our data will look like in the tabular form after extraction:
|The Snitch Cartel: Origins,||81.0,||"Jul 28,||2021",||Season 1,||60 Episodes,No Tagline,||"['Crime', 'Soap']",||['Juan Pablo Urrego']|
|Noovo Le Fil Québec,||Not rated yet,||"Mar 29, 2021",||Season 1,||110 Episodes,||No Tagline,||['News'],||['Lisa-Marie Blais']|
Fo|r each TV-show we'll create a CSV file in the following format:
- Title, User_rating, Release_date, Current_season, Current_season_Episodes, Tagline, Genre, Cast
- The Snitch Cartel: Origins, 81.0, "Jul 28, 2021", Season 1, 60 Episodes, No Tagline, "['Crime', 'Soap']", ['Juan Pablo Urrego']
- Noovo Le Fil Québec, Not rated yet, "Mar 29, 2021", Season 1, 110 Episodes, No Tagline, ['News'], ['Lisa-Marie Blais']
How to run the code
You can execute the code using the "Run" button at the top of this page and selecting "run on Binder". You can make changes and save your own version of the notebook to Jovia by executing the following cells:
Download the webpage using
Let's visit the website first and then we can examin the information we need. Following are the steps we will take to get the information and put into a proper format.
requestslibrary to downlaod the web page. The library can be installed using
To download a page , we can use the
get function from requests, which returns a response object.
!pip install requests --upgrade --quiet