Web Scraping Assignment
Scraping books of different genres from BooksToScrape
website using Python
Introduction:
BooksToScrape (Demo Website): 'BooksToScrape' is a demo website particularly used for scraping purposes. The website contains a list of Books for different Genres such as: Historical Fiction, Sequential Art, Mystery, Travel, Classics, etc,. Prices and ratings of the books on the website are randomly assigned and have no real meaning.
Example: The Mystery Genre contains 32 books with the Book titles for which, the prices and ratings are randomly assigned.
The page http://books.toscrape.com/ provides a list of books of different Genres. In this project, we will retrieve information from this page using web scraping.
Web Scraping: Web scraping is the process of extracting and parsing data from websites in an automated fashion using a computer program. It's a useful technique for creating datasets for research and learning. While web scraping often involves parsing and processing HTML documents, some platforms also offer REST APIs to retrieve information in a machine-readable format like JSON. In this tutorial, we'll use web scraping and REST APIs to create a real-world dataset.
Learn more about Web Scraping here: https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/
We'll use the Python libraries Requests and Beautiful Soup to scrape the data in this project.
Here is an outline of the steps we'll follow:
- Download the web page using
requests
- Parse the HTML source code using
Beautiful Soup
- Extract Book Titles, Price, Rating and availability of the book from the page
- Compile extracted information into Python lists and dictionaries
- Extract and combine data from multiple pages
- Save the extracted information into a
CSV
file
By the end of the project, we will create a CSV file in the following format:
Book_Title, Book_rating, Book_cost, Book_availability
It's Only the Himalayas, Two, £45.17, In stock
Full Moon over Noahâs ..., Four, £49.43, In stock
How to Run the Code:
You can execute the code using the "Run" button at the top of this page and select "Run on Binder". You can make changes and save your own version of the notebook to Jovian by executing the following cells:
Download the webpage using requests
We'll use the requests
library to download the web pages.
The library can be installed using pip
.
!pip install requests --upgrade --quiet