Updated 3 years ago
Exploratory Data Analysis using Python - A Case Study
Analyzing responses from the Stack Overflow Annual Developer Survey 2020
Introduction
In this tutorial, we'll analyze the StackOverflow developer survey dataset. The dataset contains responses to an annual survey conducted by StackOverflow. You can find the raw data & official analysis here: https://insights.stackoverflow.com/survey.
There are several options for getting the dataset into Jupyter:
- Download the CSV manually and upload it via Jupyter's GUI
- Use the
urlretrieve
function from theurllib.request
to download CSV files from a raw URL - Use a helper library, e.g.,
opendatasets
, which contains a collection of curated datasets and provides a helper function for direct download.
We'll use the opendatasets
helper library to download the files.
!pip install opendatasets
Collecting opendatasets
Downloading opendatasets-0.1.20-py3-none-any.whl (14 kB)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.8/site-packages (from opendatasets) (4.50.2)
Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from opendatasets) (7.1.2)
Collecting kaggle
Downloading kaggle-1.5.12.tar.gz (58 kB)
|████████████████████████████████| 58 kB 4.4 MB/s eta 0:00:011
Requirement already satisfied: six>=1.10 in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (1.15.0)
Requirement already satisfied: certifi in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2020.6.20)
Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2.8.1)
Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2.24.0)
Collecting python-slugify
Downloading python_slugify-5.0.2-py2.py3-none-any.whl (6.7 kB)
Requirement already satisfied: urllib3 in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (1.25.11)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->kaggle->opendatasets) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->kaggle->opendatasets) (2.10)
Collecting text-unidecode>=1.3
Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
|████████████████████████████████| 78 kB 10.3 MB/s eta 0:00:01
Building wheels for collected packages: kaggle
Building wheel for kaggle (setup.py) ... done
Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73053 sha256=1a5b9e1543a6da15de3b965b4502973e18a3e979295027ac5d8111c5068fd62e
Stored in directory: /home/jovyan/.cache/pip/wheels/29/da/11/144cc25aebdaeb4931b231e25fd34b394e6a5725cbb2f50106
Successfully built kaggle
Installing collected packages: text-unidecode, python-slugify, kaggle, opendatasets
Successfully installed kaggle-1.5.12 opendatasets-0.1.20 python-slugify-5.0.2 text-unidecode-1.3
import opendatasets as od
od.download('stackoverflow-developer-survey-2020')
0it [00:00, ?it/s]
Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/survey_results_public.csv to ./stackoverflow-developer-survey-2020/survey_results_public.csv
100%|█████████▉| 94314496/94603888 [00:03<00:00, 63176802.12it/s]
0it [00:00, ?it/s]
Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/survey_results_schema.csv to ./stackoverflow-developer-survey-2020/survey_results_schema.csv
0%| | 0/8428 [00:00<?, ?it/s]
0it [00:00, ?it/s]
0%| | 0/2268 [00:00<?, ?it/s]
Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/README.txt to ./stackoverflow-developer-survey-2020/README.txt