Workshop - Web Scraping with Selenium & AWS

Introduction to Programming with Python

Please refer to the below link for the updated code.

Code: https://github.com/aakashns/selenium-youtube-scraper-live

Web scraping is a great way to extract public information from websites and create datasets for data analysis and machine learning. In this live hands-on workshop, we walk through the process of building and deploying a web scraping project from scratch using Python, Selenium, and AWS Lambda.

Objective

  1. Scrape top 10 trending videos on YouTube using Selenium
  2. Set up a recurring job on AWS Lambda to scrape every 30 minutes
  3. Send the results as a CSV attachment over email (or to a spreadsheet)

Prerequisites

Python

Topics Covered

  • GitHub
  • Replit
  • Selenium
  • AWS Lambda
  • SMTP

Step 1 - Create a GitHub repository

Step 2 - Launch the repository on Replit

Note: Chromedriver & chromium doesn't come pre-installed with Replit after a new update. Please follow these steps to add chromedriver & chromium to replit: https://jovian.ai/birajde9/replit-add-chromdriver-chromium

Step 3 - Extract information using Selenium

Step 4 - Send results over email using SMTP

NOTE: Google security policy has been updated and the discussed procedure won't let you send emails using just a username and password now. Follow this blog for the steps to send results over email using SMTP: https://blog.jovian.ai/web-scraping-using-selenium-2a3ffa1f03f4

Step 5 - Set up a recurring job on AWS Lambda

Selenium Lambda Layers: https://github.com/aakashns/selenium-aws-lambda-layers

The workshop lasts approximately 3 hours and all code will be written live during the workshop. You will be able to follow along with the recording to work on your own web scraping project.