Linkedin Learning Web Scraping Using Selenium
Scraping Top Courses for Linkedin Learning using Selenium
Linkedin is a social networking website for the professional life. Users can make connections with other people they have worked with, post their work experience and skills, look for jobs, and look for workers. It enables users to keep in track of the professional world.
Linkedin over the years have developed various other platforms which help in assisting the career and enhancing the skills. One such platform they introduced is Linkedin Learning. The platform provides various topics to browse through and picks the best suited topic for the profile. It includes topics in the filed of Business, Technology, Creative, and certificates. Apart from the various courses available, it also includes learning path modules for providing the right direction to proceed into any particular field.
Web scraping is the technique of extracting the unstrutured information out of the webpage and convert it into a structured data using spreadsheets or any database. Various webscraping methods are used to scrape the data. For example, BeautifulSoup is a python package that helps in webscraping. Selenium is another Web UI used to automate webpages and extract the data. Depending upon the type of website, different methods and techniques are used to extract information. For the static web pages, BeautifulSoup can be used while to extract information out of a dynamic webpage, Selenium is the best suited option. Other available options are Crawl, Rest APIs and Scrapy.
From the available categories to browse the topics, we will use "Technology" category to extract the available top courses in each category.
We will use the Python libraries Selenium to automate the dynamic website and to scrape the data, Pandas to store the extracted information in an organised form and getpass to store the credentials securely.
Outline of the Project
Here's an outline of the steps we'll follow:
- Installing and Importing the required
Python librariesand functions
- Using the
getpassfunction to login to the page.
- Directing the
webdriverto the particular url.
- Parse the html sourse code using
- Extracting all the required information by creating a function
parse_course_detail()and storing the information in the form of Python dictionaries.
- Saving the extracted information using a function
scrape_data()and creating the csv file.
- Creating a final csv file that contains the combined list of the 3 topics with the extracyed information.
By the end of the project, we will be able to create a csv file in the following format:
How to Run the Code?
You can execute the code using the "Run" button at the top of this page and selecting "Run locally".
Note: This project runs on the local computer. Please download the required chrome driver on your computer to execute.
The chrome driver must be downloaded with the chrome version that is installed on the local computer.
Installing and Importing the required libraries and functions:
- selenium - to automate the dynamic webpage
- time - function in python to give wait timing to the webpage to load
- pandas - to store the extracted information in a structured way
pip install selenium --quiet
Note: you may need to restart the kernel to use updated packages.