Scraping Gadgets360 Review
SCRAP GADGET 360
What is Web Scraping: Introduction
Web scraping typically extracts large amounts of data from websites for a variety of uses such as price monitoring, enriching machine learning models, financial data aggregation, monitoring consumer sentiment, news tracking, etc. Browsers show data from a website. However, manually copy data from multiple sources for retrieval in a central place can be very tedious and time-consuming. Web scraping tools essentially automate this manual process.
Image is taken from https://hirinfotech.com/
Web scraping is use in various industry to collect data from website and help us to take nesseary action.
for example
- Competitor Price Monitoring
- Monitoring MAP Compliance
- Fetching Images and Product Descriptions
- Monitoring Consumer Sentiment
- Aggregated News Articles
- Market Data Aggregation
- Extracting Financial Statement
- Real-Time Analytics for data science etc
Image is taken from https://towardsdatascience.com
Project Outline:
- We are going to scrape https://gadgets.ndtv.com to built a dataset.
- we will get review title, Review author, Purlished date, category and a link of the particular review.
- For each page we will get 20 review descriptions.
- Finally we will create and save .csv file for future use.
First We install and import all required library for this project.
In this project we are using request library to get data from websites and BeautifulSoup to parse the webpage and extract valuable html data in text format for further process
Creating Environment i.e Install all required library and import to that program
!pip install requests --upgrade --quiet
!pip install pandas requests BeautifulSoup4 --upgrade --quiet
from bs4 import BeautifulSoup
import requests
import csv
import pandas as pd
Our base webpage is https://gadgets.ndtv.com/reviews/page-
in the url whatever numeric no we put it will go to that review page. I already inform that webpage containes many review in many pages. every page has 20 review. so we need to go each page to collect data.
for this in the very begining it ask for the input page no. that means how many pages we want to scrap.
for example if we input no 5. it scrap 5 review pages from 1-5. so total review we can get is 20*5 = 100
see in the program url = base_url + str(page)
str(page) is like 1,2,3..... depends upon user input
https://gadgets.ndtv.com/reviews/page-1 ----- For first page
https://gadgets.ndtv.com/reviews/page-2 ----- Fot the second page
.
.
.
.
https://gadgets.ndtv.com/reviews/page-18 ----- For the eighteen page and so on.
.
.
.
for this below code i am taking only first page for scrap