Jovian
Sign In

Project Web Scraping With Python

Scrapping most popular CAD models in different categeroies from GRABCAD

GRABCAD ( largest online community of professional engineers, designers & students to work and share there cad models among the community )

alt

INTRODUCTION:

Grabcad is a platform where we can upload or download CAD models to show up our work and get a chance to win exciting prizes too. Basically GRABCAD evolved as a community of engineers and currently there 52 lakh registered users and 31 lakh open source models available on the website . This vast free cad model library is very helpful for students and learning professionals who wants to be a part of CAD related jobs or research for learning different designing softwares such as Solidworks, Catia, Autocad, pro-E etc.
It brings together all the tools engineers need to manage and share CAD files into one platform.

OBJECTIVE:

As a Data Science Engineer we aims to get the all time most downloaded design models by parsing the information from this website in to a form of tabular data under different categories of knowledge domain such as Machine Design, 3D printing, Aerospace, Electrical so that we can further get to know the interests among the community, difficulty level faced to design the models and ofcourse to distribute the prizes for the most popular ones.

(In this notebook we will limit our objective to scrape the data for each category separately to limit the dataset , We can also combine the data for different categories and further analysis and testing on that complete data can be done on a similar path)

The overall steps I'll follow are:

  1. Understanding the structure of grabcadwebsite
  2. Install and Import required libraries
  3. Download the page and extract the urls from grabcad's all time most downloaded library page using selenium.webdriver and kora.selenium under different cageories (Total 33 gategories are there on the page)
  4. Extract model links( 100 per page) from each url extracted above under the required categories among those 33 mentioned above
  5. Download each model link and parse the data out of it in 4 categories i.e Names, Downloads, Likes, Comments
  6. Combine extracted data into a dictionary from each category.
  7. Compiling all details into a Pandas dataframe and creating a CSV file

By the end of the project, is expected to create a csv file with the following information under machine design category:

name,downloads, likes, comments
Stepper Motor Nema 17, 41925, 575, 78
MQ-1 Predator UAV, 31373, 802, 144
CNC 3-axis, 30116, 994, 175
Planetary Gearbox, 29050, 900, 189


NOTE:

  1. Grabcad is a dynamic website using javascript therefore we can not extract the webpage HTML from beautiful soup here, Therefore use of selenium is preffered for these kind of websites. But yes we can use beautiful soup after getting the webpage HTML from the webdriver in some websites.

  2. If you want to code on you local computer install Selenium and one of the webdriver depends on your browser to extract the page, But if you are coding on cloud based services such as google colab then you need to install kora Selenium but remember this kora Selenium will not work on binder and others so be aware.

Install and Import required libraries:

!pip install kora -q 
!pip install requests 
from kora.selenium import wd
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

from bs4 import BeautifulSoup
import requests

Requirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (2.23.0) Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests) (2.10) Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests) (2021.5.30) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests) (1.24.3) Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests) (3.0.4)
dwivedi-rishabh95
Rishabh Dwivedi6 months ago