Zerotopandas Course Project Starter
Project Title - Global Sustainable Energy Analysis
In this project, we will be doing an end-to-end analysis of a 20-year progress that has been made in terms of sustainable energy adoption across the globe. For this purpose, the data has been taken from 2000 till 2020 each for the 176 countries that are present in this dataset. Some of the common parameters that we will be looking at during the analysis are country's name, access to electricity (in % terms), primary source of energy, renewable energy source contribution (%), gdp_per_capita, per_capita energy consumption, etc. The main goal of this analysis is to get an overview of current sustainable energy levels across each country/continent and understand how the overall sustainable energy trend has been through data and visualizations.
Since this is a real-world data, any analysis which is done (whether in the form of visuals or raw information) will provide us real-world insight into the present situation. To meet this purpose, I have taken the primary dataset (the global sustainable energy level data) from Kaggle.com. Apart from this, in order to get continent level info and other country-demographics, I will be using two secondary datasets that will be merged later. With that being said, please find below the links for the dataset on Kaggle:-
- Global Sustainable Energy Dataset: https://www.kaggle.com/datasets/anshtanwar/global-data-on-sustainable-energy (Primary)
- Global Country Information Dataset: https://www.kaggle.com/datasets/nelgiriyewithana/countries-of-the-world-2023 (Secondary)
- World Population Dataset: https://www.kaggle.com/datasets/muhammedtausif/world-population-by-countries (Secondary)
In order to perform the analysis, we will be taking the help of Python libraries like NumPy, Pandas, Matplotlib and Seaborn. The first two will enable us with tabular data manipulation and operations, while the latter two will aid us with the visualizations that depict a story.
And finally I would like to thank the course Data Analysis with Python: Zero to Pandas and its instructor Aakash N S for providing me the necessary learning and a platform to build this project. Without him and his team, the project would not have been possible.
How to run the code
This is an executable Jupyter notebook hosted on Jovian.ml, a platform for sharing data science projects. You can run and experiment with the code in a couple of ways: using free online resources (recommended) or on your own computer.
Option 1: Running using free online resources (1-click, recommended)
The easiest way to start executing this notebook is to click the "Run" button at the top of this page, and select "Run on Binder". This will run the notebook on mybinder.org, a free online service for running Jupyter notebooks. You can also select "Run on Colab" or "Run on Kaggle".
Option 2: Running on your computer locally
-
Install Conda by following these instructions. Add Conda binaries to your system
PATH
, so you can use theconda
command on your terminal. -
Create a Conda environment and install the required libraries by running these commands on the terminal:
conda create -n zerotopandas -y python=3.8
conda activate zerotopandas
pip install jovian jupyter numpy pandas matplotlib seaborn opendatasets --upgrade
- Press the "Clone" button above to copy the command for downloading the notebook, and run it on the terminal. This will create a new directory and download the notebook. The command will look something like this:
jovian clone notebook-owner/notebook-id
- Enter the newly created directory using
cd directory-name
and start the Jupyter notebook.
jupyter notebook
You can now access Jupyter's web interface by clicking the link that shows up on the terminal or by visiting http://localhost:8888 on your browser. Click on the notebook file (it has a .ipynb
extension) to open it.
Downloading the Dataset
In order to extract the dataset into this notebook, we need to first download it in our system. For this purpose, we will use the opendatasets function that will directly yield us the data into the notebook. As mentioned, all our datasets which will be used in this project can be found on Kaggle.com.
Also, since we will be using three separate datasets (global_sustainable_energy dataset, global_country_information dataset and world_population dataset), each of them will be downloaded first on this notebook and then later merged together whenever we need them their use. Of these three, the primary one (viz the global sustainable energy dataset will be the one will be looking at mostly and fetch major insights from this.