Scraping the data of world's top 500 companies by
All the data scientists/data analysts doing research or analysis need some Data the begin their research with.Now the question that comes to our mind is "How the data is being prepared, from where we can get this much data, and How to extract the exact data that we need?
The answer is,
There are sevaral ways to collect this data and one of them is
Web Scraping, web scraping:
"Web scraping is a technique used to extract structured data from websites containing unstructured data through an automated process.".
For web scraping some tools are available there and Python comes to your rescue by provinding you with some great libraries which help to make your our work easier,these libraries help to automate our work.
These libraries are:
Here in this notebook I've used
BeautifulSoup libraries to scrape the data from the website value.today
Value.Today is a software analytics company which provides World's Top Corporate Companies Information, Corporate Companies Information, Financial Data of Company and World Financial News.
The dataset that I've created in this notebook will be beneficial for the comparative study of these 500 companies. You can easily study about the Rise and Falls of these companies in from each 6 months since 2020.
Here is an outline for the full journey of
- First of all we need to import all usefull libraries which we arte going to use in this entire notebook
- Parsing all the
html databy using
- Start finding all the usefull
tagsand data inside those tags
- After collecting all the data we than create a
dataframeusing pandas library, and also make a
CSV fileto save all the data
All website generally does not allow to scrape their data, some wbsites allow to use their data for reaserch and study purpose only.
Some websites provides their data in CSV format allowing you to scrape their data with REST API
Before scraping any website, we should look for a terms and conditions page to see if there are explicit rules about scraping. If there are, we should follow them. If there are not, then it becomes more of a judgement call.
You can run the cells by clikicking the
Runbutton available in the toolbar, or you can just press
Shift + Enterto run the cell
!pip install jovian --upgrade --quiet