Jovian
Sign In

Web Scraping Assignment

Scraping books of different genres from BooksToScrape website using Python

Introduction:

alt

BooksToScrape (Demo Website): 'BooksToScrape' is a demo website particularly used for scraping purposes. The website contains a list of Books for different Genres such as: Historical Fiction, Sequential Art, Mystery, Travel, Classics, etc,. Prices and ratings of the books on the website are randomly assigned and have no real meaning.
Example: The Mystery Genre contains 32 books with the Book titles for which, the prices and ratings are randomly assigned.

The page http://books.toscrape.com/ provides a list of books of different Genres. In this project, we will retrieve information from this page using web scraping.

Web Scraping: Web scraping is the process of extracting and parsing data from websites in an automated fashion using a computer program. It's a useful technique for creating datasets for research and learning. While web scraping often involves parsing and processing HTML documents, some platforms also offer REST APIs to retrieve information in a machine-readable format like JSON. In this tutorial, we'll use web scraping and REST APIs to create a real-world dataset.

Learn more about Web Scraping here: https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/

We'll use the Python libraries Requests and Beautiful Soup to scrape the data in this project.

Here is an outline of the steps we'll follow:

  1. Download the web page using requests
  2. Parse the HTML source code using Beautiful Soup
  3. Extract Book Titles, Price, Rating and availability of the book from the page
  4. Compile extracted information into Python lists and dictionaries
  5. Extract and combine data from multiple pages
  6. Save the extracted information into a CSV file

By the end of the project, we will create a CSV file in the following format:

Book_Title, Book_rating, Book_cost, Book_availability
It's Only the Himalayas, Two, £45.17, In stock
Full Moon over Noahâs ..., Four, £49.43, In stock

How to Run the Code:

You can execute the code using the "Run" button at the top of this page and select "Run on Binder". You can make changes and save your own version of the notebook to Jovian by executing the following cells:

Download the webpage using requests

We'll use the requests library to download the web pages.

The library can be installed using pip.

!pip install requests --upgrade --quiet
PS
Priyanka Srinivas5 months ago