Indeed Datascience Jobposting
Project 1 Web Scraping Indeed-Job Posting in Data Scientist in New York
The purpose of this project is to become familiar with the process of web scaping by using Python as our first project of the Jovia bootcamp Data Analyst Certification.
Indeed is one of the most popular job websites in the market today.It is a job aggregating website available in 60 + countries and covers multiple job boards.
The reason that we are focusing on Data Scientist is because I would like to become familiar with this type of job search, as well as make an exploratory data analysis for the future. The criteria that we used was the following:
1. City: New York 2. Salary: equal higher $20,000 3. Type of Job: Data Scientist
You might think what is an URL and why we are using?
URL is the address where we will find the data that we want to scrap in this case the following link is our url:https://www.indeed.com/jobs?q=data+scientist+%2420%2C000&l=New+York&start=0
We can split the URL into two main parts:
- Base URL https://www.indeed.com/jobs - The query parameters started by question marks ?q=data+scientist+%2420%2C000&l=New+York&start=0
Query parameters are generally consists of three things:
- Star by a question mark "?"
- Information encoded in key-value pairs,joined by equal sign in our example is identifed as: q=data+scientist+%2420%2C000&l=New+York
- Separator every URL can have multiple query parameters separated by(&) in this case we have only one &start=0
You migth be think what is web scraping mean?
Web scrapping, web harvesting, or web data extraction is data scraping used for extracting data from websites.
The next question that you might have what an HTML is?
HTML is a Hypertext Markup Language a standidized system for tagging text files to achieve font, color,graphic and hyperlink effects on World Wide Web Pages.
Why we are scrapped Indeed job?
The idea is to get access to the latest job data, analyze job trends and automate job boards.In addition,to understand how the job posting website are built.
Because we are working on Jovian cloud platform we need to import Jovian to save the notebook we need to write Jovian.commit.