Learn practical skills, build real-world projects, and advance your career

Exploratory Data Analysis using Python - A Case Study

Analyzing responses from the Stack Overflow Annual Developer Survey 2020

alt

Introduction

In this tutorial, we'll analyze the StackOverflow developer survey dataset. The dataset contains responses to an annual survey conducted by StackOverflow. You can find the raw data & official analysis here: https://insights.stackoverflow.com/survey.

There are several options for getting the dataset into Jupyter:

  • Download the CSV manually and upload it via Jupyter's GUI
  • Use the urlretrieve function from the urllib.request to download CSV files from a raw URL
  • Use a helper library, e.g., opendatasets, which contains a collection of curated datasets and provides a helper function for direct download.

We'll use the opendatasets helper library to download the files.

!pip install opendatasets
Collecting opendatasets Downloading opendatasets-0.1.20-py3-none-any.whl (14 kB) Requirement already satisfied: tqdm in /opt/conda/lib/python3.8/site-packages (from opendatasets) (4.50.2) Requirement already satisfied: click in /opt/conda/lib/python3.8/site-packages (from opendatasets) (7.1.2) Collecting kaggle Downloading kaggle-1.5.12.tar.gz (58 kB) |████████████████████████████████| 58 kB 4.4 MB/s eta 0:00:011 Requirement already satisfied: six>=1.10 in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (1.15.0) Requirement already satisfied: certifi in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2020.6.20) Requirement already satisfied: python-dateutil in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2.8.1) Requirement already satisfied: requests in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (2.24.0) Collecting python-slugify Downloading python_slugify-5.0.2-py2.py3-none-any.whl (6.7 kB) Requirement already satisfied: urllib3 in /opt/conda/lib/python3.8/site-packages (from kaggle->opendatasets) (1.25.11) Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.8/site-packages (from requests->kaggle->opendatasets) (3.0.4) Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.8/site-packages (from requests->kaggle->opendatasets) (2.10) Collecting text-unidecode>=1.3 Downloading text_unidecode-1.3-py2.py3-none-any.whl (78 kB) |████████████████████████████████| 78 kB 10.3 MB/s eta 0:00:01 Building wheels for collected packages: kaggle Building wheel for kaggle (setup.py) ... done Created wheel for kaggle: filename=kaggle-1.5.12-py3-none-any.whl size=73053 sha256=1a5b9e1543a6da15de3b965b4502973e18a3e979295027ac5d8111c5068fd62e Stored in directory: /home/jovyan/.cache/pip/wheels/29/da/11/144cc25aebdaeb4931b231e25fd34b394e6a5725cbb2f50106 Successfully built kaggle Installing collected packages: text-unidecode, python-slugify, kaggle, opendatasets Successfully installed kaggle-1.5.12 opendatasets-0.1.20 python-slugify-5.0.2 text-unidecode-1.3
import opendatasets as od
od.download('stackoverflow-developer-survey-2020')
0it [00:00, ?it/s]
100%|█████████▉| 94314496/94603888 [00:03<00:00, 63176802.12it/s] 0it [00:00, ?it/s]
0%| | 0/8428 [00:00<?, ?it/s] 0it [00:00, ?it/s] 0%| | 0/2268 [00:00<?, ?it/s]