Web Scraping With Python
Scrape Top Quotes
Scrape the Top quotes from the quotestoscrape.com
In this project, I will be creating a dataset of top quotes by different tags. The csv file will contain the following information,
Each CSV file will be a top quote category, and inside the file contains information as follows
quote, author_of_quote, about_author_url, related_tags
So for example, if we search for love quotes, the csv file will contain the following information.
It is better to be hated for what you are than to be loved for what you are not, André Gide, http://quotes.toscrape.com/author/Andre-Gide, life love
Extract the webpage from the quotestoscrape.com
We define a function to extract the webpage using requests library and return a beautiful soup object.
import requests from bs4 import BeautifulSoup base_url = "http://quotes.toscrape.com/" def parse_quote_page(url): """ Used to parse the parse the webpage and return a beautiful soup object""" quote_url = url response = requests.get(quote_url) if response.status_code != 200: print('Status code:', response.status_code) raise Exception('Failed to fetch web page ' + quote_url) return BeautifulSoup(response.text)
Since we have now extracted the webpage, we need to find all the quotes present in the webpage. So we define 2 functions, get_top_quotes and parse_each_quotes.
Get_top_quotes is used to extract all the quotes in the webpage.
parse_each_quotes is used to extract specific information from each quote_tags.