Web Scraping Assignment
Scraping books of different genres from
BooksToScrape website using Python
BooksToScrape (Demo Website): 'BooksToScrape' is a demo website particularly used for scraping purposes. The website contains a list of Books for different Genres such as: Historical Fiction, Sequential Art, Mystery, Travel, Classics, etc,. Prices and ratings of the books on the website are randomly assigned and have no real meaning.
Example: The Mystery Genre contains 32 books with the Book titles for which, the prices and ratings are randomly assigned.
The page http://books.toscrape.com/ provides a list of books of different Genres. In this project, we will retrieve information from this page using web scraping.
Web Scraping: Web scraping is the process of extracting and parsing data from websites in an automated fashion using a computer program. It's a useful technique for creating datasets for research and learning. While web scraping often involves parsing and processing HTML documents, some platforms also offer REST APIs to retrieve information in a machine-readable format like JSON. In this tutorial, we'll use web scraping and REST APIs to create a real-world dataset.
Learn more about Web Scraping here: https://www.geeksforgeeks.org/what-is-web-scraping-and-how-to-use-it/
Here is an outline of the steps we'll follow:
- Download the web page using
- Parse the HTML source code using
- Extract Book Titles, Price, Rating and availability of the book from the page
- Compile extracted information into Python lists and dictionaries
- Extract and combine data from multiple pages
- Save the extracted information into a
By the end of the project, we will create a CSV file in the following format:
Book_Title, Book_rating, Book_cost, Book_availability It's Only the Himalayas, Two, Â£45.17, In stock Full Moon over Noahâs ..., Four, Â£49.43, In stock
How to Run the Code:
You can execute the code using the "Run" button at the top of this page and select "Run on Binder". You can make changes and save your own version of the notebook to Jovian by executing the following cells:
Download the webpage using
We'll use the
requests library to download the web pages.
The library can be installed using
!pip install requests --upgrade --quiet