Jovian
Sign In

Eda F2

Exploratory Data Analysis of Airline Reviews

alt

Motivation Behind Choosing Airline Reviews Dataset

According to the latest estimates, there are approximately 100,000 flights per day. Passenger flights alone account for over 90,000 flights per day, transporting millions of passengers to destinations all around the world. By FAA's 2018 data, Every day, FAA's Air Traffic Organization (ATO) provides service to more than 45,000 flights and 2.9 million airline passengers across more than 29 million square miles of airspace.

Given that so many people fly every day, it is an important task to understand the quality of the airlines. It is an important task for every company to improve their product and customore experience similarly it is important as consumers to know about the quality of a product or an industry.

The airlines reviews dataset provide us with the data of the reviews of the airlines. The dataset is collected from airlinequality.com. The dataset contains moer than 100,000 rows and 22 columns. It is no surprise that there are many reviews of airlines. In this notebook, we will explore the data from airlinequality.com, a website where users can review airlines. We will explore the data, hand missing values and get insights.

What is Exploratory Data Analysis

Exploratory Data Analysis (EDA) is the process of exploring, investigating and gathering insights from data using statistical measures and visualizations. The objective of EDA is to develop and understanding of data, by uncovering trends, relationships and patterns.

Here's the outline of the steps we'll follow:

  1. Installing and importing the necessary libraries
  2. Downloading a dataset from an online source
  3. Data preparation and cleaning with Pandas
  4. Open-ended exploratory analysis and visualization
  5. Asking and answering interesting questions
  6. Summarizing inferences and drawing conclusions

Installing the required packages

In this project, we'll use data analysis tools like Numpy, Pandas and visulization tools like matplotlib, seaborn, plotly and folium.

let's install the required libraries and import them.

%pip install pandas --quiet
%pip install seaborn  --quiet
%pip install plotly  --quiet
%pip install folium  --quiet
%pip install opendatasets  --quiet
%pip install matplotlib  --quiet
%pip install calendar  --quiet
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 23.1.2 -> 23.2 [notice] To update, run: python.exe -m pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 23.1.2 -> 23.2 [notice] To update, run: python.exe -m pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 23.1.2 -> 23.2 [notice] To update, run: python.exe -m pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 23.1.2 -> 23.2 [notice] To update, run: python.exe -m pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 23.1.2 -> 23.2 [notice] To update, run: python.exe -m pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
[notice] A new release of pip is available: 23.1.2 -> 23.2 [notice] To update, run: python.exe -m pip install --upgrade pip
Note: you may need to restart the kernel to use updated packages.
ERROR: Could not find a version that satisfies the requirement calendar (from versions: none) ERROR: No matching distribution found for calendar [notice] A new release of pip is available: 23.1.2 -> 23.2 [notice] To update, run: python.exe -m pip install --upgrade pip

Importing the installed libraries

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import numpy as np # linear algebra 
import matplotlib.pyplot as plt # plotting
import opendatasets as od # downloading datasets
import seaborn as sns # plotting
import calendar # calendar
import re # regular expressions
import plotly.express as px # plotting
from plotly.offline import download_plotlyjs,init_notebook_mode,plot,iplot # plotting
init_notebook_mode(connected=True)

%matplotlib inline
joyeshm999
Joyesh Meshram5 months ago