Web Scraping Project
Scraping startup from Sequoia VC using Python
Sequoia Capital is one of the most famous venture capital in the USA and is very famous around the world. They invest in various companies with great potential and they have backed and supported the most innovative companies in the world. Great examples are Airbnb, Twitch, Apple, Square and many more. This is their website where you can learn more both about the Venture Capital itself, and about all the companies financed.
You can explore the website and search different type of information.
- Who are the people that work in the VC and help the startups in the various phases:
- What are the various phases of building a startup and a great company.
- Which are the most successful startups that have been part of their acceleration program.
In this project I have retrieved information from Sequoia's web page using _web_scarping techniques: the process of extracting information from a website in an automated fashion using code. We will use python libraries [Requests] (https://docs.python-requests.org/en/latest/) and BeautifulSoup4 to scrape data from this page.
This is an outline of the steps that we were followed:
- Download the web page using "requests"
- Parse the HTML code using "BeautifulSoup"
- Extract Companies names, info, web url, Twitter url, LinkedIn url and info regarding the founding year and the founders
- Compile extracted information into python dictionaries
- Extract data from multiple pages
- Save the extracted info into a csv file
By the end of the project I have created a csv file in the following format:
Company name, Company Description, Company website link, Company twitter link, Company LinkedIn link,Other Info
You can execute the code using the "Run" botton on the top of this page. You can make changes and save your own version of the notebook on (https://jovian.ai)
!pip install jovian --upgrade --quiet