Exploratory Data Analysis using Python - A Case Study

Analyzing responses from the Stack Overflow Annual Developer Survey 2020

alt

Introduction

In this tutorial, we'll analyze the StackOverflow developer survey dataset. The dataset contains responses to an annual survey conducted by StackOverflow. You can find the raw data & official analysis here: https://insights.stackoverflow.com/survey.

There are several options for getting the dataset into Jupyter:

Download the CSV manually and upload it via Jupyter's GUI
Use the urlretrieve function from the urllib.request to download CSV files from a raw URL
Use a helper library, e.g., opendatasets, which contains a collection of curated datasets and provides a helper function for direct download.

We'll use the opendatasets helper library to download the files.

!pip install opendatasets

Collecting opendatasets
  Downloading opendatasets-0.1.20-py3-none-any.whl (14 kB)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.8/site-packages (from opendatasets) (4.50.2)
Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from opendatasets) (7.1.2)
Collecting kaggle
  Downloading kaggle-1.5.12.tar.gz (58 kB)
     |████████████████████████████████| 58 kB 4.4 MB/s eta 0:00:011
Requirement already satisfied: six>=1.10 in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (1.15.0)
Requirement already satisfied: certifi in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2020.6.20)
Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2.8.1)
Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2.24.0)
Collecting python-slugify
  Downloading python_slugify-5.0.2-py2.py3-none-any.whl (6.7 kB)
Requirement already satisfied: urllib3 in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (1.25.11)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->kaggle->opendatasets) (3.0.4)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->kaggle->opendatasets) (2.10)
Collecting text-unidecode>=1.3
  Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB)
     |████████████████████████████████| 78 kB 10.3 MB/s eta 0:00:01
Building wheels for collected packages: kaggle
  Building wheel for kaggle (setup.py) ... done
  Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73053 sha256=1a5b9e1543a6da15de3b965b4502973e18a3e979295027ac5d8111c5068fd62e
  Stored in directory: /home/jovyan/.cache/pip/wheels/29/da/11/144cc25aebdaeb4931b231e25fd34b394e6a5725cbb2f50106
Successfully built kaggle
Installing collected packages: text-unidecode, python-slugify, kaggle, opendatasets
Successfully installed kaggle-1.5.12 opendatasets-0.1.20 python-slugify-5.0.2 text-unidecode-1.3

import opendatasets as od

od.download('stackoverflow-developer-survey-2020')

0it [00:00, ?it/s]

Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/survey_results_public.csv to ./stackoverflow-developer-survey-2020/survey_results_public.csv

100%|█████████▉| 94314496/94603888 [00:03<00:00, 63176802.12it/s]
0it [00:00, ?it/s]

Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/survey_results_schema.csv to ./stackoverflow-developer-survey-2020/survey_results_schema.csv


  0%|          | 0/8428 [00:00<?, ?it/s]

0it [00:00, ?it/s]

  0%|          | 0/2268 [00:00<?, ?it/s]

Downloading https://raw.githubusercontent.com/JovianML/opendatasets/master/data/stackoverflow-developer-survey-2020/README.txt to ./stackoverflow-developer-survey-2020/README.txt