Data Science Bootcamp Project1
Scraping medical research topics on TED Talks
!pip install jovian --upgrade --quiet
# Execute this to save new versions of the notebook jovian.commit(project="data-science-bootcamp-project-1")
[jovian] Attempting to save notebook.. [jovian] Updating notebook "dafrireece/data-science-bootcamp-project1" on https://jovian.ai [jovian] Uploading notebook.. [jovian] Uploading additional files... [jovian] Committed successfully! https://jovian.ai/dafrireece/data-science-bootcamp-project1
About Web Scraping
One task of a data analyst is to analyze and report insights gleaned from data sets. Ever wondered how these data sets are collected? That's where knowledge of web scraping comes in handy.
What is web scraping?
Web scraping is a technique used to extract content and data from websites.The data extracted is stored in a databases and retrieved later to perform analysis and communicate meaning.
“Data are just summaries of thousands of stories.” – By Chip & Dan Heath
Given that large amounts of data is extracted, web scraping automates tasks that might otherwise take humans far longer, or even be impossible to complete on a timely basis.
How does web scraping work?
Hypertext Markup Language(HTML) is used to give structure to websites.A mark-up language is universal meaning scrapers can easily pinpoint specific elements within them and extract content.
A general process of web scraping follows these steps:
- Identifying a site to scrape.
Requestto fetch the HTML code.
- Locate HTML elements using Beautiful Soup.
- Use Pandas to create CSV files.
Let's see how to put these steps in practice.