TV SHOWS WEB SCRAPER
Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. It is about downloading structured data from the web, selecting some of that data, and passing along what you selected to another process.
The idea is to go through a website of interest and, using special python programming libraries, extract relevant data/information that can be presented as a DataFrame.
This web scraping project explores the 200 most popular TV Shows on themoviedb.org in descending order. One challenge is that we will have to parse in several pages to extract these informations as all 200 TV Shows are not on the same web page.
Below are the steps i will be taking:
download the web page using requests library.
parse the HTML source code using BeautifulSoup.
Extract show name, release date and web link.
Get links and information of 9 other pages to complete 200 TV shows.
Create a Dataframe and save the information as a CSV file.
Get info about a TV series using its web link.
Create a dataframe containing some TV Show's details.
Scrape all shows and create their csv files containing some of their info.
Create a folder to store all the created csv files.
I will extracting the below information for each TV Show.
- viewer age suitability
- Top casts