Jovian
Sign In

Dataanalyst Bootcamp Project1 Web Scraping

Know what interests famous people have by reading their quotes

alt

Sometimes we all need a little inspiration or advice on how to react to given
life situations, whether on how to be a valuable person, a better friend, or
how to react to something adverse. Various famous and successful people have
said things that we all can find helpful. There is a website, "Quotes to scrape",
that offers dozens, if not hundreds, of such quotes.

Web-scraping is a gathering of useful information from a website of interest and
presenting it in a meaningful way.

In this project read in a list of quotes from famous people using the
"quotes to scrape" website, based on the default top quotes, or quotes filtered
based on various subjects:

  • love
  • inspirational
  • life
  • humor
  • books
  • reading
  • friendship
  • friends
  • truth
  • similes.

Once you pick a subject of interest a request will be made over the web and
an http response document will be returned by the website from where the request
was submitted.

Information will be extracted from the document using the Python library,
BeautifulSoup. Here is some general information from their documentation:

"Beautiful Soup is a Python library for pulling data out of HTML and XML files.
It works with your favorite parser to provide idiomatic ways of navigating,
searching, and modifying the parse tree. It commonly saves programmers hours
or days of work."

We will analyze the data and report

  • author's name (the person being quoted)
  • an 'about' link, giving information about the author
  • the text of the quote

We will then create a dataset, storing the gathered information

Using the authors and corresponding quotes listed, create a list of dictionaries,
each one with an entry containing the author's name, a link to his/her about info,
and the quote itself. This dataset will be stored as a tabular database, in CSV
format and can be downloaded for subsequent data analysis and machine learning
tasks

# The Jovian platform is where this notebook was developed
# and copies of it are maintained there so lets install the 
# library and commit the latest copy to it
!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="dataanalyst-bootcamp-project1-web-scraping")

Install the libraries

  • requests allows this notebook to interact with websites
  • bs4, or Beautiful Soup allows us to parse information from HTML documents
artlasky
Art Lasky2 years ago