Sign In

Exploring Indias Air Quality 2015 To 2020

Exploratory Data Analysis of India's Gaseous Emissions and the Impact on Air Quality.


India is the seventh-largest country by area, the second-most populous country, and the most populous democracy in the world. By 2020 India had an estimated population of 1.38 billion people. No wonder it has one of the largest economies in the world, currently at seventh position based on GDP with an expected uptrend due to continuous growth. With trade agreements with over 50 countries India's impact to the world is every important both locally and globally. In this notebook I look at the data of gaseous emissions and air quality index (AQI) of India's city during this time of global economy success.

Humans need air to live, beyond that simple fact of life is the question of what is in the air we breathe. As observed in histroy, economic growth is usually accompanyed with an increase in gaseous emissions. This obivously negatively impacts the quality of air so in this notebook I will use data to find out the trends in the gaseous emissions and AQI of cities in India. The dataset I am using is sourced from India's Central Pollution Control Board through kaggle. The database contains air quality data and AQI (Air Quality Index) at hourly and daily level of various stations across multiple cities in India. In this study, I will focus on the gaseous emission like ammonia (NH3), nitrogenous oxides (NOx), ozone (O3), sulphur dioxide (SO2), nitrogen dioxide (NO2), Nitrogen monoxide (NO), Carbon monoxide (CO), Air Quality Index (AQI) and Air Quality Index Bucket (AQI_Bucket)

What is Exploratory Data Analysis

Exploratory Data Analysis (EDA) is the process of exploring, investigating and gathering insights from data using statistical measures and visualizations. The objective of EDA is to develop and understanding of data, by uncovering trends, relationships and patterns.

EDA is both a science and an art. On the one hand it requires the knowledge of statistics, visualization techniques and data analysis tools like Numpy, Pandas, Seaborn, Geopandas etc. On the other hand, it requires asking interesting questions to guide the investigation and interpreting numbers & figures to generate useful insights.

Here's the outline of the steps to follow:

  • Downloading a dataset from an online source
  • Data preparation and cleaning with Pandas
  • Open-ended exploratory analysis and visualization
  • Asking and answering interesting questions
  • Summarizing inferences and drawing conclusions

By the end of the project we'll have gained high level insights on the trends on gaseous emissions and the AQI of cities across India.

How to run the code

The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Colab as long as you have gmail account. You can also select "Run on Binder" or "Run on Kaggle", but you'll need to create an account on Kaggle to use that platform. You can make changes and save your own version of the notebook to Jovian by executing the following cells.

Since the selected dataset contains 5+ million rows of data, I have selected "Gogle Colab" to execute the code for faster response.

When you are commiting the notebook to Jovian for the first time in "Colab" it will ask for API key which will be found in your Jovian account getstarted section. After which pressing "Ctrl + S" on windows or "Cmd + S" on mac will save changes on your jovian notebook.

Installing the required packages

In this project, we'll use data analysis tools like Numpy, Pandas and visulization tools like matplotlib, seaborn, plotly and folium. Before that we need packages like jovian, opendatasets, geopandas inorder to have access to the necessary libraries.
let's install the required libraries and import them.

Muwanguzi Jonathan6 months ago