Web Scraping Project
Scraping Top 250 Movies Sorted by Rating on IMDb Using Python
IMDb is the most authoritative source of entertainment information, with features designed to help fans explore the world of movies and shows and decide what to watch.IMDb (an acronym for Internet Movie Database) is an online database of information related to films, television programs, home videos, video games, and streaming content online — including cast, production crew, and personal biographies, plot summaries, trivia, ratings, and fan, and critical review.
The page https://www.imdb.com/search/title/?groups=top_250&sort=user_rating,desc&start= provides information about the Top 250 Movies.In this Project we will use this page to retrive information about the movies using Web Scraping.
Lets talk about Web Scraping
Web scraping is an automatic method to obtain large amounts of data from websites. Most of this data is unstructured data in an HTML format which is then converted into structured data in a spreadsheet or a database so that it can be used in various applications.
How to do Web Scraping Using Python
As we know, Python is has various applications and there are different libraries for different purposes. In our further demonstration, we will be using the following libraries:
Requests: The requests module allows you to send HTTP requests using Python.The HTTP request returns a Response Object with all the response data (content, encoding, status, etc).Requets
BeautifulSoup: Beautiful Soup is a Python package for parsing HTML and XML documents. It creates parse trees that is helpful to extract the data easily.BeautifulSoup
Pandas: Pandas is a library used for data manipulation and analysis. It is used to extract the data and store it in the desired format.Pandas
Here's a step-by-step outline For this project:
- Download the Webpage using requests.
- Parse the HTML source code using beautiful Soup.
- Extract topic names, descriptions, and URLs from page.
- Compile the extracted information into Python List and Dictionary.
- create a CSV file using Pandas to save the extracted information.
By the end of this Project, We will create a CSV file which contains the following information:
Movie_Name, Release year, Movie Url, Rating
How to Run the Code
You can Execute the code using the 'Run' botton at the top of this page. You can also make changes and save your own version of this notebook to Jovian by executing the following code cells :
!pip install jovian --upgrade --quiet
import jovian
Download the Webpage using requests
We can the use the requests library to download the web page