Jovian
Sign In

Data Science Bootcamp Project1

Scraping medical research topics on TED Talks

!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="data-science-bootcamp-project-1")
[jovian] Attempting to save notebook.. [jovian] Updating notebook "dafrireece/data-science-bootcamp-project1" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/dafrireece/data-science-bootcamp-project1

About Web Scraping

One task of a data analyst is to analyze and report insights gleaned from data sets. Ever wondered how these data sets are collected? That's where knowledge of web scraping comes in handy.

What is web scraping?

Web scraping is a technique used to extract content and data from websites.The data extracted is stored in a databases and retrieved later to perform analysis and communicate meaning.

“Data are just summaries of thousands of stories.” – By Chip & Dan Heath

Given that large amounts of data is extracted, web scraping automates tasks that might otherwise take humans far longer, or even be impossible to complete on a timely basis.

How does web scraping work?

Hypertext Markup Language(HTML) is used to give structure to websites.A mark-up language is universal meaning scrapers can easily pinpoint specific elements within them and extract content.

A general process of web scraping follows these steps:

  • Identifying a site to scrape.
  • Use Request to fetch the HTML code.
  • Locate HTML elements using Beautiful Soup.
  • Use Pandas to create CSV files.

Let's see how to put these steps in practice.

dafrireece
Frida Achieng6 months ago