Kaggle is an online community platform for data scientists and machine learning enthusiasts. It allows users to:
We can't use requests
to download a dataset from Kaggle, because it doesn't provide a raw URL for the dataset. In this notebook, we will learn how to download a Kaggle dataset using the opendatasets library with an API token.
opendatasets
is a Python library for downloading datasets from online sources like Kaggle and Google Drive using a simple Python command.
pip
command, and then import it.!pip install opendatasets --upgrade --quiet
import opendatasets as od
opendatasets.download()
function.For now, we will be working with the US Accidents dataset: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents
A good way would be to add the url in a seperate variable instead of passing the URL everytime.
download
function, you will be asked to enter your Kaggle username and API key.After signing up on https://www.kaggle.com/, click on your profile picture on the top right and select "My Account" from the menu.
Scroll down to the API section and click "Create new API Token" which shall download a kaggle.json
file into your system.
The file should contain your kaggle username and key in the format below:
{"username":"YOUR_KAGGLE_USERNAME","key":"YOUR_KAGGLE_KEY"}
download
function.dataset_url='https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents'
od.download(dataset_url)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: himanigulati
Your Kaggle Key: ··········
Downloading us-accidents.zip to ./us-accidents
100%|██████████| 269M/269M [00:01<00:00, 188MB/s]
This was one way to add credentials, i.e by manually copy pasting the key from the downloaded kaggle.json
file. Another way to add these credentials is pretty straightforward.
We can save the extra seconds of copying our Kaggle username and key from a file to a Jupyter notebook by directly uploading the json file in the same directory as our Jupyter Notebook. This way the credentials will be read automatically.
Opendatasets Source Code: https://github.com/JovianML/opendatasets
Kaggle: https://www.kaggle.com
Some good datasets avaialable on Kaggle:
Getting started with Kaggle competitions: https://www.kaggle.com/code/alexisbcook/getting-started-with-kaggle-competitions
The best use you can make out of Kaggle is by participating in Kaggle competitions. With experience comes wisdom and with kaggle competitions comes skills(for Machine Learning) :)
The competitions you win on Kaggle and your Kaggle ranking can have an advantageous impact on your resume for a career in Data Science.
Kaggle also offers other features like GPU, opportuninty to work with other people with smillar interests accross the world, tons and tons of datasets, etc...
All the best :)
!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="kaggle-opendatasets")
[jovian] Detected Colab notebook...
[jovian] Please enter your API key ( from https://jovian.ai/ ):
API KEY: ··········
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/himani007/kaggle-opendatasets