Jovian
Sign In

Web Scraping Bollywood Filmographies Project

WEB- SCRAPING - BOLLYWOOD - FILMOGRAPHIES - PROJECT

Data Source :Bollywood Filmography
alt

Let's see what is web-scraping

  • Web scraping is the process of collecting structured web data in an automated fashion. It's also called web data extraction. ... In general, web data extraction is used by people and businesses who want to make use of the vast amount of publicly available web data to make smarter decisions.

How does web scraping works

  • Identify the target website.
  • Collect URLs of the pages where you want to extract data from.
  • Make a request to these URLs to get the HTML of the page.
  • Use locators to find the data in the HTML.
  • Save the data in a JSON or CSV file or some other structured format.
    alt

About Filmograpgy

  • A filmography is a list of films related by some criteria. For example, an actor's career filmography is the list of films they have appeared in; a director's comedy filmography is the list of comedy films directed by a particular director. The term, which has been in use since at least 1957.

alt

Project Idea

In this Project I will parse through the Actors and Actresses of the Bollywood.

I will retrieve information from the page Bollywood Filmography using web scraping.

Project Goal

The project goal is to build a web scraper that withdraws all desirable information and assemble them into a single CSV. The format of the output CSV file is shown below:

#Actor/Actress NameProfiles_urlsImages_url
0John Abrahamhttps://en.wikipedia.org/wiki/https://en.wikipedia.org/wiki/
1Kajal Agarwalhttps://en.wikipedia.org/wiki/https://en.wikipedia.org/wiki/
187Prithviraj Sukumaran filmographyhttps://en.wikipedia.org/wiki/https://en.wikipedia.org/wiki/
#YearTitleRoleActor/Actress name
02003JismKabir lalJohn Abraham
12003SaayaDr.Akash "Akki" BhatnagarJohn Abraham
2232015-2016Daar Sabko Lagta haiHost/presenterBipasha Basu
deepakkumawat2120
Deepak Kumawat6 months ago