Webscrap Full Project
Web Scrapping Multiple Amazon Headset Product Pages Dynamically
Project Idea and goal:
For this Project, I wanted to explore how information of multiple pages can be extracted dynamically(without knowing how many pages and without hard coding URL for each page) for Amazon headphone product page.
Requirement:- Headphone Product page of Amazon has total of 20 pages displaying different brands of headphones, new price, old price and number of the reviews. Aim of the project is to get the product name, new price, Old Price and Number of the Reviews for the Product for all the 20 pages without hard coding the URL link to each page each time.
Steps:
1. Download the current web page using requests.
2. Parse the HTML Source code using BeautifulSoup library and get the required information from current page.
3. Identify whether current page is the last page if not then write a function to get the URL for next page and repeat step 1 and 2 till no page is left to scrape.
Packages Used:
1. Requests
2. bs4
3. pandas
First Page URL (Headset page) - https://www.amazon.com/s?k=headphones&crid=T7DXZT5BZT92&qid=1637732331&sprefix=head%2Caps%2C203
P
phegde6 months ago