Data Analysis with Python: Zero to Pandas

Air Quality of India- Exploratory Data Analysis-Project

In this Project we will study about the Air Quality of India which is the most prominent issue, our country is dealing with. For this analysis, we will be downloading the dataset from the Kaggle.

What does the dataset contains?

**Dataset contains Air Quality data of India from 1st Jan-2015 till 1st July 2020.

Even though, it does not contain the most recent data but it gives us a fair idea about the alarming levels, and which cities and states are suffering the most. In addition, Since March 2020, due to COVID-19 most of us are opearting from Home so, the air quality data will not give us the real analysis.

Why do we need to study and explore??

Outdoor air pollution alone causes 2.1 to 4.21 million deaths annually, making it one of the top contributors to human death.

**We all can see the difference in the air quality and how much pollution we are breathing day in and out. For last 5-6 years it has become a greater cause of concern as the pollution levels are growing exponentially and needs individual attention and action.

**Along with government, its individual's responsibility as well to help and prevent the pollution.So, we need to understand it better and then decide how we can help.

What is Air Pollution?

Air pollution is the contamination of air due to the presence of substances in the atmosphere that are harmful to the health of humans and other living beings, or cause damage to the climate or to materials. There are many different types of air pollutants, such as gases (including ammonia, carbon monoxide, sulfur dioxide, nitrous oxides, methane, carbon dioxide and chlorofluorocarbons), particulates (both organic and inorganic), and biological molecules.

Air pollution is a significant risk factor for a number of pollution-related diseases, including respiratory infections, heart disease, COPD, stroke and lung cancer.

Air pollution can cause diseases, allergies, and even death to humans; it can also cause harm to other living organisms such as animals and food crops, and may damage the natural environment (for example, climate change, ozone depletion or habitat degradation) or built environment (for example, acid rain). Both human activity and natural processes can generate air pollution.

How is the Air quality measured?

Air quality is measured with the Air Quality Index, or AQI. The AQI works like a thermometer that runs from 0 to 500 degrees. However, instead of showing changes in the temperature, the AQI is a way of showing changes in the amount of pollution in the air.

Air quality is a measure of how clean or polluted the air is. Monitoring air quality is important because polluted air can be bad for our health—and the health of the environment.

A tutorial of how AQI is calculated is available here: https://www.kaggle.com/rohanrao/calculating-aqi-air-quality-index

What is in the air?

The air in our atmosphere is mostly made up of two gases that are essential for life on Earth: nitrogen and oxygen. However, the air also contains smaller amounts of many other gases and particles. AQI tracks following major air pollutants:

Ground level ozone
Carbon monoxide
Sulfur dioxide
Nitrogen dioxide
Airborne particles, or aerosols

Ground level ozone and airborne particles are the two air pollutants that pose the greatest risk to human health. They are also the two of the main ingredients in smog, a type of air pollution that reduces visibility.
As, we can see from the color-coded chart showing the relative health concern relative to AQI levels.

India

The National Air Quality Index (AQI) was launched in New Delhi on September 17, 2014, under the Swachh Bharat Abhiyan.

The Central Pollution Control Board along with State Pollution Control Boards has been operating National Air Monitoring Program (NAMP) covering 240 cities of the country having more than 342 monitoring stations.An Expert Group comprising medical professionals, air quality experts, academia, advocacy groups, and SPCBs was constituted and a technical study was awarded to IIT Kanpur. IIT Kanpur and the Expert Group recommended an AQI scheme in 2014. While the earlier measuring index was limited to three indicators, the new index measures eight parameters. The continuous monitoring systems that provide data on near real-time basis are installed in New Delhi, Mumbai, Pune, Kolkata and Ahmedabad.

There are six AQI categories, namely Good, Satisfactory, Moderately polluted, Poor, Very Poor, and Severe. The proposed AQI will consider eight pollutants (PM10, PM2.5, NO2, SO2, CO, O3, NH3, and Pb) for which short-term (up to 24-hourly averaging period) National Ambient Air Quality Standards are prescribed. Based on the measured ambient concentrations, corresponding standards and likely health impact, a sub-index is calculated for each of these pollutants. The worst sub-index reflects overall AQI. Likely health impacts for different AQI categories and pollutants have also been suggested, with primary inputs from the medical experts in the group. The AQI values and corresponding ambient concentrations (health breakpoints) as well as associated likely health impacts for the identified eight pollutants are as follows:

AQI Data

The picture below explains how the AQI parameter is actually calculated for a given location:
alt

# We start with the import of all the python libraries we need:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import matplotlib
import seaborn as sns
import opendatasets as od
import plotly.express as px

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (20,10)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
pd.set_option("display.max_columns", 120)
pd.set_option("display.max_rows", 120)

Downloading the Dataset

Lets first import all the important libraries and then download the dataset from the Kaggle.

import jovian