How to Load Kaggle Datasets into Jupyter Notebooks

Kaggle is an online community platform for data scientists and machine learning enthusiasts. It allows users to:

  • find and publish data sets,
  • explore and build models in a web-based data-science environment,
  • work with other data scientists and machine learning engineers, and
  • enter competitions to solve data science challenges.

We can't use requests to download a dataset from Kaggle, because it doesn't provide a raw URL for the dataset. In this notebook, we will learn how to download a Kaggle dataset using the opendatasets library with an API token.

Opendatasets

opendatasets is a Python library for downloading datasets from online sources like Kaggle and Google Drive using a simple Python command.

  1. Installing and Importing: You can install it with a simple pip command, and then import it.
!pip install opendatasets --upgrade --quiet
import opendatasets as od
  1. Downloading URL: The next step is getting the URL for the dataset you want to load into your jupyter notebook and then passing it with the opendatasets.download() function.

For now, we will be working with the US Accidents dataset: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents

Imgur

A good way would be to add the url in a seperate variable instead of passing the URL everytime.

  1. Kaggle Credentials: Now, after running the download function, you will be asked to enter your Kaggle username and API key.

Imgur

Kaggle Credentials

  1. After signing up on https://www.kaggle.com/, click on your profile picture on the top right and select "My Account" from the menu.

  2. Scroll down to the API section and click "Create new API Token" which shall download a kaggle.json file into your system. Imgur

The file should contain your kaggle username and key in the format below: {"username":"YOUR_KAGGLE_USERNAME","key":"YOUR_KAGGLE_KEY"}

  1. Now you can directly add these credentials after running the download function.
dataset_url='https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents'
od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds Your Kaggle username: himanigulati Your Kaggle Key: ·········· Downloading us-accidents.zip to ./us-accidents
100%|██████████| 269M/269M [00:01<00:00, 188MB/s]

This was one way to add credentials, i.e by manually copy pasting the key from the downloaded kaggle.json file. Another way to add these credentials is pretty straightforward.

Automatically Adding Kaggle Credentials

We can save the extra seconds of copying our Kaggle username and key from a file to a Jupyter notebook by directly uploading the json file in the same directory as our Jupyter Notebook. This way the credentials will be read automatically.

Conclusion

The best use you can make out of Kaggle is by participating in Kaggle competitions. With experience comes wisdom and with kaggle competitions comes skills(for Machine Learning) :)

The competitions you win on Kaggle and your Kaggle ranking can have an advantageous impact on your resume for a career in Data Science.

Kaggle also offers other features like GPU, opportuninty to work with other people with smillar interests accross the world, tons and tons of datasets, etc...

All the best :)

!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="kaggle-opendatasets")
[jovian] Detected Colab notebook... [jovian] Please enter your API key ( from https://jovian.ai/ ): API KEY: ·········· [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/himani007/kaggle-opendatasets
 
himani007
Himani Gulati3 months ago
Jovian
Sign In