Updated 3 years ago
Overview
We have been given techonolgy employment fo the years 2018,2019,2020. We are to perform Data Wrangling
and Data Analysis
with the aid of pandas
, matplotlib
and plotly
import matplotlib.pyplot as plt
plt.figure(num=None, figsize=(8, 6), dpi=80, facecolor='w', edgecolor='k')
import seaborn as sns
import numpy as np
import pandas as pd
<Figure size 640x480 with 0 Axes>
Task 1: Data Loading and Data Aggregation
- Load the 3 data files into the variables data_18, data_19, data_20.
data_18 = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/IT_Salary_Survey_EU_18-20/Survey_2018.csv")
data_19 = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/IT_Salary_Survey_EU_18-20/Survey_2019.csv")
data_20 = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/IT_Salary_Survey_EU_18-20/Survey_2020.csv")
Task 2: Data Analysis
- Display the first 5 rows of the 2018 survey data
- Display a concise summary of the 2020 data and list out 3 observations/inferences that you observe from the result. For this you will need to use the info() method.
- Display the descriptive statistics of the 2018 survey data
- Display the number of missing values in each column of the 2018 survey data
How many people responded to the survey in each of the 3 years? Has the number increased or decreased over the years? - Display all the unique values and their frequency in the column - “Number of vacation days” of 2020 data. Write down your observations (at least one) for this result.