Learn practical skills, build real-world projects, and advance your career
Created 2 years ago
Scraping the Topics of Github
Importing the Libraries for Scraping the Website
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from selenium import webdriver
url = 'https://github.com/topics'
# Creating a new dictionary to store the scraped data inorder to create a pandas a dataframe
topics = {"Topic Name":[],
"Topic Description":[],
"Topic URL":[]}
Inspecting the website
- Before trying to scrape any website, it is inevitable to inspect and understand the DOM(Document Object Module) of that website. It enables us to scrape the needed data effectively. We should check whether the data we are going to scrape is in any div tag or p tag.
- Scraping a dynamic website is more complicated than a static website. We should perform some steps to scrape a dynamic website.