Jovian
Sign In

Final Web Scraping Project

Discover The Most Popular Recent Movies On TMDB With Python

banner

Web scraping entails utilizing a website's html structure to uncover the tags and elements most useful in extracting the data most relevant for your analysis. We can use those tags, to in turn get the text within them, and use that text as data points for our analysis projects. Python and its associated libraries can enable the return of thousands of rows of data in mere miliseconds.

The TMDB website, themoviesdb.org, contains over ... communitity-generated entries about movies, tv shows and streaming content, dating from the 1930's to the present. This engaging site provides a wealth of information about movies, which will be the subject of this project, including trailers, posters, budget, release date, and main cast. In this project the data extracted will be limited to the movie title, the movie url and the release date.

Here is an outline of the steps that we will follow:

  1. Download the webpage using the requests library and parse the html source cod using beautiful soup.
  
  2. Extract item such as movie title, movie url and release date from the site.
  
  3. Compile the extracted information into python lists.
  
  4. Combine data from multiple pages.
  
  4. Save the extracted data to a csv(comma separated values) file.
  

By the end of our project, we will have csv files that we can convert to a dataframe that lets us view the information in a tabular form, so that it is easier to query for insights about consumer entertainment preferences over the years.

 Will one of your favorite movies make the list?

Use the "Run" button at the upper-right corner of the page and select run on binder to execute the code in this notebook.

!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="final-web-scraping-project")
[jovian] Attempting to save notebook.. [jovian] Updating notebook "designthink314/final-web-scraping-project" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/designthink314/final-web-scraping-project
jovian.commit()
[jovian] Attempting to save notebook.. [jovian] Updating notebook "designthink314/final-web-scraping-project" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/designthink314/final-web-scraping-project
designthink314
Wanda Taylor6 months ago