Web Scraping Bollywood Filmographies Project
Let's see what is web-scraping
- Web scraping is the process of collecting structured web data in an automated fashion. It's also called web data extraction. ... In general, web data extraction is used by people and businesses who want to make use of the vast amount of publicly available web data to make smarter decisions.
How does web scraping works
- Identify the target website.
- Collect URLs of the pages where you want to extract data from.
- Make a request to these URLs to get the HTML of the page.
- Use locators to find the data in the HTML.
- Save the data in a JSON or CSV file or some other structured format.
- A filmography is a list of films related by some criteria. For example, an actor's career filmography is the list of films they have appeared in; a director's comedy filmography is the list of comedy films directed by a particular director. The term, which has been in use since at least 1957.
The project goal is to build a web scraper that withdraws all desirable information and assemble them into a single CSV. The format of the output CSV file is shown below:
|187||Prithviraj Sukumaran filmography||https://en.wikipedia.org/wiki/||https://en.wikipedia.org/wiki/|
|0||2003||Jism||Kabir lal||John Abraham|
|1||2003||Saaya||Dr.Akash "Akki" Bhatnagar||John Abraham|
|223||2015-2016||Daar Sabko Lagta hai||Host/presenter||Bipasha Basu|