Jovian
Sign In

Scraping Books Website

alt

Scraping 'Book To Scrape' Website using python

What we do:

  • Web scraping, web harvesting, or web data extraction is data scraping used for extracting data from websites. The web scraping software may directly access the World Wide Web using the Hypertext Transfer Protocol or a web browser. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. according to wikipedia.
  • I am going to use http://books.toscrape.com/ website
  • I will be using Python, requests, BeautifulSoup, Pandas.

Here are the steps to follow

  • We are going to scrape http://books.toscrape.com/ .
  • We'll get a list of books.For each book we'll get book title, book page url and book description.
  • For each book, we'll get the top 25 pages in the book from the book page.
  • For each page, we'll grab the name of the UPC, product type, Price, Tax, Availability, Number of reviews.
  • For each book we'll create a CSV file.

Scraping Home Page of the website

How to do it:

  • use requests to download the page
  • user BS4 to parse and extract information
  • convert to a Pandas DataFrame

Let's write a function to download the home page.

import requests
from bs4 import BeautifulSoup

def get_topics_page():
    topics_url = 'http://books.toscrape.com/'
    response = requests.get(topics_url)
    page_contents = response.text
    doc = BeautifulSoup(page_contents, 'html.parser')
    return doc
mbruv97
Ruwini Shashikala6 months ago