Kaggle is an online community platform for data scientists and machine learning enthusiasts. It allows users to:
We can't use
requests to download a dataset from Kaggle, because it doesn't provide a raw URL for the dataset. In this notebook, we will learn how to download a Kaggle dataset using the opendatasets library with an API token.
opendatasets is a Python library for downloading datasets from online sources like Kaggle and Google Drive using a simple Python command.
pipcommand, and then import it.
!pip install opendatasets --upgrade --quiet
import opendatasets as od
For now, we will be working with the US Accidents dataset: https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents
A good way would be to add the url in a seperate variable instead of passing the URL everytime.
downloadfunction, you will be asked to enter your Kaggle username and API key.
After signing up on https://www.kaggle.com/, click on your profile picture on the top right and select "My Account" from the menu.
Scroll down to the API section and click "Create new API Token" which shall download a
kaggle.json file into your system.
The file should contain your kaggle username and key in the format below:
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds Your Kaggle username: himanigulati Your Kaggle Key: ·········· Downloading us-accidents.zip to ./us-accidents
100%|██████████| 269M/269M [00:01<00:00, 188MB/s]
This was one way to add credentials, i.e by manually copy pasting the key from the downloaded
kaggle.json file. Another way to add these credentials is pretty straightforward.
We can save the extra seconds of copying our Kaggle username and key from a file to a Jupyter notebook by directly uploading the json file in the same directory as our Jupyter Notebook. This way the credentials will be read automatically.
Opendatasets Source Code: https://github.com/JovianML/opendatasets
Some good datasets avaialable on Kaggle:
Getting started with Kaggle competitions: https://www.kaggle.com/code/alexisbcook/getting-started-with-kaggle-competitions
The best use you can make out of Kaggle is by participating in Kaggle competitions. With experience comes wisdom and with kaggle competitions comes skills(for Machine Learning) :)
The competitions you win on Kaggle and your Kaggle ranking can have an advantageous impact on your resume for a career in Data Science.
Kaggle also offers other features like GPU, opportuninty to work with other people with smillar interests accross the world, tons and tons of datasets, etc...
All the best :)
!pip install jovian --upgrade --quiet
# Execute this to save new versions of the notebook jovian.commit(project="kaggle-opendatasets")
[jovian] Detected Colab notebook... [jovian] Please enter your API key ( from https://jovian.ai/ ): API KEY: ·········· [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/himani007/kaggle-opendatasets