Exploratory data analysis(EDA) is used by data analysts/ Scientists to analyze and investigate data and datasets and summarize thier main features, often employing data visualization methods. It helps in understanding the dataset very easily. Helps in manipulating data for further use.
EDA is basically used to see what data can reveal beyond the formal modelling or hypothesis testing task and provides better understanding of data set variables and the relationship between them. Originally developed by American mathematician Jhon Tukey in the 1970s, EDA techniques continue to be a widely used method in the data discovery process today.
EDA can help us deliver great business results, by improving our existing knowledge and can also help in giving out new insights that we might not be aware of
Tools Used
opendatasets
(Jovian library to download a Kaggle dataset)
Data cleaning:
1.Pandas
2.Numpy
Data Visualization
1.Matplotlib
2.Seaborn
3.plotly
4.Heatmap
In this project, we are trying to analyse Global Cargo Data. This selected dataset covers import and export volumes for 5.000 commodities across most countries on Earth over the last 30 years. Personally i find commodities quite interesting because it helps to analyze not only about income of country but also international behaviours and relations of countries.
We will download our dataset from Kaggle
using the library opendataset
created by Jovian
which imports the datasets directly from the 'kaggle' website
import opendatasets as od
dataset = 'https://www.kaggle.com/ unitednations/global-commodity-trade-statistics' od.download(dataset)
The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Colab. You can also select "Run on Binder" or "Run on Kaggle", but you'll need to create an account on Google Colab or Kaggle to use these platforms. Also, Colab will provide the most memory that will be needed for this project to run.
To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.
Jupyter Notebooks: This is a Jupyter notebook - a document made of cells. Each cell can contain code written in Python or explanations in plain English. You can execute code cells and view the results, e.g., numbers, messages, graphs, tables, files, etc., instantly within the notebook. Jupyter is a powerful platform for experimentation and analysis. Don't be afraid to mess around with the code & break things - you'll learn a lot by encountering and fixing errors. You can use the "Kernel > Restart & Clear Output" menu option to clear all outputs and start again from the top.
!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="global-cargo-data-analysis")
[jovian] Detected Colab notebook...
[jovian] Please enter your API key ( from https://jovian.ai/ ):
API KEY: ··········
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
Step 1: We will download the dataset from "https://www.kaggle.com/" using the opendatasets
library created by Jovian.
So lets begin by downloading the data, and listing the files within the dataset: 'https://www.kaggle.com/unitednations/global-commodity-trade-statistics'
! pip install jovian opendatasets --upgrade --quiet
dataset = 'https://www.kaggle.com/unitednations/global-commodity-trade-statistics'
import opendatasets as od
od.download(dataset)
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds
Your Kaggle username: adityanhebbar
Your Kaggle Key: ··········
Downloading global-commodity-trade-statistics.zip to ./global-commodity-trade-statistics
100%|██████████| 121M/121M [00:01<00:00, 124MB/s]
The dataset has been downloaded and extracted.
data_dir = '/content/global-commodity-trade-statistics'
import os
os.listdir(data_dir)
['commodity_trade_statistics_data.csv']
Data cleaning is the process by which we make sure that the data that we are using for our analysis is completely ready. It means we dont ahve any duplicates, or missing values, the data is in the right format, not corrupted and thus ready to be used for analysis.
import pandas as pd
import numpy as np
trade_data_csv = '/content/global-commodity-trade-statistics/commodity_trade_statistics_data.csv'
trade_df_cur = pd.read_csv(trade_data_csv, low_memory = False, nrows = 1000000)
trade_df_cur.shape #shape functions helps to know about rows and columns in dataset
(1000000, 10)
As we can see we have 10L rows and 10 columns to work with. Ofcourse we cannot work with all of this for what we do today. So let us start working on cleaning this data. i.e. selecting the required data and putting it in the form of our analysis
trade_df_cur.sample(2) #sample() functions shows any two sample rows (since 2 is mentioned here)
trade_df_cur.info() # info() helps to get overview of the dataset
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000000 entries, 0 to 999999
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 country_or_area 1000000 non-null object
1 year 1000000 non-null int64
2 comm_code 1000000 non-null int64
3 commodity 1000000 non-null object
4 flow 1000000 non-null object
5 trade_usd 1000000 non-null int64
6 weight_kg 992195 non-null float64
7 quantity_name 1000000 non-null object
8 quantity 977353 non-null float64
9 category 1000000 non-null object
dtypes: float64(2), int64(3), object(5)
memory usage: 76.3+ MB
trade_df_cur.describe() #describe is used to get statistical info about neumarical columns
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(trade_df_cur.dtypes) #we can look into datatypes_
country_or_area object
year int64
comm_code int64
commodity object
flow object
trade_usd int64
weight_kg float64
quantity_name object
quantity float64
category object
dtype: object
As we can see that by default float64 and integer64 bit datatypes are used, though the values that have been used are not that large and can be stored in a 32bit datatype as well.
We can convert thesdse 64bit datatypes to 32 bit so that we can increase the speed and decrease the space the dataset holds.
change_dtypes = {
'year' : 'int32',
'comm_code': 'int32',
'trade_usd' : 'int32',
'weight_kg':'float32',
'quantity':'float32'
}
%%time
trade_df_modified = pd.read_csv(trade_data_csv, low_memory=False, nrows= 1500000, dtype=change_dtypes)
CPU times: user 2.23 s, sys: 357 ms, total: 2.59 s
Wall time: 2.58 s
Now we have finally read 10L rows from the 'trade_data_csv' with our bselected datatypes.
trade_df_modified.head() #shows us the top 5 rows of our final dataframe
trade_df_modified.shape
(1500000, 10)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(trade_df_modified.dtypes) #a look into datatypes_
country_or_area object
year int32
comm_code int32
commodity object
flow object
trade_usd int32
weight_kg float32
quantity_name object
quantity float32
category object
dtype: object
Let us proceed to check for the dupliucate and missing values:
trade_df_modified.duplicated().sum() #duplicated functiuon checks for duplicates
0
As we can see there are no duplicates in our dataset. Lets try to work around null values if there are any.
trade_df_modified.isnull().sum() #isnull() along with sum() tells us the count for missing values
country_or_area 0
year 0
comm_code 0
commodity 0
flow 0
trade_usd 0
weight_kg 9765
quantity_name 0
quantity 24612
category 0
dtype: int64
As we can see above only weight and quantity columns have null values which less in volume when compared to the larger dataset. So we can drop thos rows which do have null values. Options for dealing with missing values in neuamarical columns
1.Leave them as is, if they won't affect our analysis
2.Replace them with average
3.Remove the rows containing missing values.
4.Replace them with fixed values.
5.Use the values from other rows & columns to estimate the missing values to see what we can do about them
jovian.commit()
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
trade_df_modified['weight_kg'].isna().sum()
9765
As we can see there are 9765
missing values in weight_kg
column. Let us clean this.
trade_df_modified['weight_kg'].value_counts()
0.0 22656
1.0 5385
10.0 4222
2.0 3841
20.0 3539
...
1120438.0 1
10624018.0 1
1709766.0 1
7565427.0 1
12516685.0 1
Name: weight_kg, Length: 675510, dtype: int64
a = trade_df_modified['weight_kg'].describe()
a.round(1)
count 1.490235e+06
mean 2.077931e+07
std 6.447121e+08
min 0.000000e+00
25% 2.750000e+03
50% 5.763000e+04
75% 9.795410e+05
max 6.144366e+11
Name: weight_kg, dtype: float64
Now we can see that the mean and the median value of the columns are 2.06 and 5.56. Let us replace the Nan
values with the values in the same range.
import random
trade_df_modified['weight_kg'].fillna(random.uniform(1.5,6),inplace=True)
#fillna() is used to fill the NaN values of a column
#random() function generates random variables between the said arguments
a = trade_df_modified['weight_kg'].describe()
a.round(1)
count 1.500000e+06
mean 2.064404e+07
std 6.426099e+08
min 0.000000e+00
25% 2.538000e+03
50% 5.562300e+04
75% 9.599208e+05
max 6.144366e+11
Name: weight_kg, dtype: float64
As we can see the modification didnt largely change the values of mean and median. So we can be confident about our cleaning.
trade_df_modified['weight_kg'].isna().sum()
0
jovian.commit()
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
Now let us try to clean another last column with nill values.That is quantity
column by the above method.
trade_df_modified['quantity'].isna().sum()
24612
trade_df_modified['quantity'].value_counts()
0.0 16988
1.0 6040
10.0 4304
2.0 4300
3.0 3627
...
10108742.0 1
494928.0 1
26591968.0 1
9127636.0 1
12516685.0 1
Name: quantity, Length: 672935, dtype: int64
a = trade_df_modified['quantity'].describe()
a.round(1)
count 1.475388e+06
mean 3.589867e+07
std 6.316239e+09
min 0.000000e+00
25% 2.719000e+03
50% 5.840950e+04
75% 1.005000e+06
max 5.372075e+12
Name: quantity, dtype: float64
Here also, since the mean and median values are quuite close we can replace them with value of relevent range.
import random
trade_df_modified['quantity'].fillna(random.uniform(1.5,6),inplace=True)
#fillna() is used to fill the NaN values of a column
#random() function generates random variables between the said arguments
a = trade_df_modified['quantity'].describe()
a.round(1)
count 1.500000e+06
mean 3.530964e+07
std 6.264210e+09
min 0.000000e+00
25% 2.225000e+03
50% 5.336400e+04
75% 9.547470e+05
max 5.372075e+12
Name: quantity, dtype: float64
As we can see the modification didnt largely change the values of mean and median. So we can say that modification didnt change anything largely.
trade_df_modified['quantity'].isna().sum()
0
trade_df_modified.isnull().sum() #isnull() along with sum() tells us the count for missing values
country_or_area 0
year 0
comm_code 0
commodity 0
flow 0
trade_usd 0
weight_kg 0
quantity_name 0
quantity 0
category 0
dtype: int64
As you can see there are no null values in the dataframe
jovian.commit()
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
country_or_area
: country name of recordyear
: year in which the trade has taken placecomm_code
: the harmonized and coding system generally referredcommodity
:description of a particular commodity codeflow
:flow of trade i.e. export,import, otherstrade_usd
: value of the trade in usdweight_kg
: weight of the commodity in kilogramquantity_name
: description of the quantity measurement type given the type of item (i.e. number of items, weights in,etc)quantity
: count of the quantity of a given item based of the quantity namecategory
: category to identify commodityHere, let us try to understand the dataset better with visualization.
Which will also help us to have answers to some interesting questions delivering meaningful insights into data
final_df = trade_df_modified
final_df.shape
(1500000, 10)
final_df.sample(3)
jovian.commit() #save latest version of the notebook
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
Lets begin by importing seaborn
, matplotlib
and plotly
. These are the three visualization libraries that we will be using to visualize our data.
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 15
matplotlib.rcParams['figure.figsize'] = (8, 4)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
from mpl_toolkits.mplot3d import Axes3D #light weight solution for 3D visualization
# Execute this to save new versions of the notebook
jovian.commit(project="global-cargo-data-analysis")
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
final_df.corr()
f, ax = plt.subplots(figsize = (13,13))
p = sns.heatmap(final_df.corr(),annot = True, linewidth = 5 ,fmt = ' .1f', ax = ax);
p.set_title("Relation between neuamarical values");
As we can n see itsrlf in the plot, there are no correlation between neuamrical columns in our dataset.
final_df.year.plot(kind = 'hist', bins = 100, figsize =(10,10))
plt.xlabel("YEAR")
plt.show()
It looks like the dataset has gained more data from 2000 to 2015. Probably the internet revolution helped to get relevent data more easily. But reduced data at the end may be manual entry error nothing to do with actual cargo transportation.
# Execute this to save new versions of the notebook
jovian.commit(project="global-cargo-data-analysis")
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
Lets create a Choropleth map to better visualize the relative responses from various countries
import folium
countries_geojson = 'https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json'
country_counts = final_df.country_or_area.value_counts()
country_counts_df = pd.DataFrame({ 'Country': country_counts.index, 'Count': country_counts.values})
country_counts_df
country_counts_df.at[0, 'Country'] = 'United States of America'
country_counts_df.at[12, 'Country'] = 'Russia'
country_counts_df.isnull().sum()
Country 0
Count 0
dtype: int64
m = folium.Map(location=[30, 0], zoom_start=2)
folium.Choropleth(
geo_data=countries_geojson,
data=country_counts_df,
columns=["Country", "Count"],
key_on="feature.properties.name",
threshold_scale=[0, 5000, 10_000,15_000, 20_000,25_000, 30_000,35_000, 40_000],
fill_color="OrRd",
fill_opacity=0.7,
line_opacity=0.2,
legend_name="Respondents",
).add_to(m)
m
Now, with the above map we can clearly understand about how best the countries have responded while collecting the global cargo data. It also gives another insight here that USA is handling its data collection strategy very well compared to other countries.
1.What is the stat of import and export of India according to this data??
cn_df = final_df[final_df.country_or_area == 'India']
cn_df.head()
cn_i = cn_df[(cn_df.flow == 'Import') & (cn_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
cn_e = cn_df[(cn_df.flow == 'Export') & (cn_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
trace1 = go.Bar(
x = cn_i.year,
y = cn_i.trade_usd,
name = "India Import",
marker = dict(color = 'rgba(102, 216, 137, 0.8)'),
)
trace2 = go.Bar(
x = cn_e.year,
y = cn_e.trade_usd,
name = "India Export",
marker = dict(color = 'rgba(224, 148, 215, 0.8)'),
)
data = [trace1, trace2]
layout = {
'xaxis': {'title': 'Year 1992-2016'},
'yaxis': {'title': 'Trade of Import & Export in India (USD)'},
'barmode': 'group',
'title': 'Import and Export in India'
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)
So, we can say that Over the last 30 Years India's import and export has increased significantly. Although it got reduced in 2009 may be due to global market crash it recovered Instantly. Another important thing to notice here is that increase in India's export Value increasing year by year.
2.What is the Value of Trade of India while comparing total world trade all these years according to the data which is available now??
cn_trade = cn_df[cn_df.comm_code!= 'TOTAL'].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
wd_trade = final_df[(final_df.year >1991) & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
# cn_trade.shape
trace0 = {
'x': cn_trade.year,
'y': cn_trade.trade_usd,
'name': "India",
'type': 'bar',
'marker': {'color':'rgba(255, 239, 208)'}
}
trace1 = {
'x': wd_trade.year,
'y': wd_trade.trade_usd,
'name': "World",
'type': 'bar',
'marker': {'color':'rgba(255, 171, 202, 0.8)'}
}
data = [trace0, trace1]
layout = {
'xaxis': {'title': 'Year 1992-2016'},
'yaxis': {'title': 'Value of Trade in USD'},
'barmode': 'relative',
'title': 'World vs India: Value of Trade'
};
fig = go.Figure(data = data, layout = layout)
iplot(fig)
By looking at the above graph, we can say that even when India's presecnce in Global cargo marketing is increasing when we compare it to whole world, its far too lesser.
3.What is the Percentage of India Trade in World Trade in % ??
# ratio
trace3 = go.Scatter(
x = cn_trade.year,
y = cn_trade.trade_usd/wd_trade.trade_usd*100,
mode = "lines+markers",
name = "Ratio of China/World",
marker = dict(color = 'rgba(245, 150, 104, 0.8)')
)
data2 = [trace3]
layout2 = dict(title = 'Percentage of India Trade in World Trade (%)',
xaxis= dict(title= 'Year 1992-2016',ticklen= 5,zeroline= False),
yaxis = {'title': 'Percentage (%)'}
)
fig2 = dict(data = data2, layout = layout2)
iplot(fig2)
It appears that, although India's presecnce in market was increasing around 1990 to 1995. it got reduced suddenly. The reason could be Indo-Pak war.And then India tried to recover but again faces drawback from 2009 recession. But suddenly after that India recoverd soon
# Execute this to save new versions of the notebook
jovian.commit(project="global-cargo-data-analysis")
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
4.Which are the top 15 countries in trading when we consider Volume?
top_countries = final_df.country_or_area.value_counts().head(15)
top_countries
Canada 39623
Australia 33813
China, Hong Kong SAR 33126
Denmark 28831
China 26934
France 25943
Brazil 24230
Czech Rep. 24140
Austria 23169
Finland 22084
Chile 21975
Croatia 21541
Cyprus 21186
Germany 20743
Argentina 20116
Name: country_or_area, dtype: int64
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title("Top 15 active countries in Cargo Trading")
sns.barplot(x=top_countries.index, y=top_countries);
As you can see these are the top 15 countries in trading when we consider volume as criteria.
5.What is the postion of India in Trading in 2000 ?
USA_trade = final_df[(final_df.country_or_area == "USA") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
JAPAN_trade =final_df[(final_df.country_or_area == "Japan") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
CHINA_trade = final_df[(final_df.country_or_area == "China") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
INDIA_trade =final_df[(final_df.country_or_area == "India") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
EUR_trade = final_df[(final_df.country_or_area == "EU-28") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
EUR_2000 = int(EUR_trade[EUR_trade.year==2000].iloc[0][1])
USA_2000 = int(USA_trade[USA_trade.year==2000].iloc[0][1])
JAP_2000 = int(JAPAN_trade[JAPAN_trade.year==2000].iloc[0][1])
CHINA_2000 = int(CHINA_trade[CHINA_trade.year==2000].iloc[0][1])
INDIA_2000 = int(INDIA_trade[INDIA_trade.year==2000].iloc[0][1])
ot_2000 = int(wd_trade[wd_trade.year==2000].iloc[0][1]) - EUR_2000 - USA_2000 - JAP_2000 - CHINA_2000 - INDIA_2000
EUR_2015 = int(EUR_trade[EUR_trade.year==2015].iloc[0][1])
USA_2015 = int(USA_trade[USA_trade.year==2015].iloc[0][1])
JAP_2015 = int(JAPAN_trade[JAPAN_trade.year==2015].iloc[0][1])
CHINA_2015 = int(CHINA_trade[CHINA_trade.year==2015].iloc[0][1])
INDIA_2015 = int(INDIA_trade[INDIA_trade.year==2015].iloc[0][1])
ot_2015 = int(wd_trade[wd_trade.year==2015].iloc[0][1]) - EUR_2015 - USA_2015 - JAP_2015 - CHINA_2015 - INDIA_2015
labels = ['Europe','USA','Japan','China','India','Others']
colors = ['#f18285', '#86e48f', '#e8a2d8', '#fff76e','#47B39C','#FFC154']
#####
trace = go.Pie(labels=labels, values=[EUR_2000, USA_2000, JAP_2000, CHINA_2000, INDIA_2000, ot_2000],
marker=dict(colors=colors, line=dict(color='#000', width=2)) )
layout = go.Layout(
title='2000 Import & Export Trade in USD',
)
fig = go.Figure(data=[trace], layout=layout)
iplot(fig, filename='basic_pie_chart')
As you can see India's position in global trading in 2000 is only about 1.01% of global market.Europe, China and Japan have biggest position.
6.What is the postion of India in Trading in 2015 ?
trace1 = go.Pie(labels=labels, values=[EUR_2015, USA_2015, JAP_2015, CHINA_2015, INDIA_2015, ot_2015],
marker=dict(colors=colors, line=dict(color='#000', width=2)) )
layout1 = go.Layout(
title='2015 Import & Export Trade in USD',
)
fig1 = go.Figure(data=[trace1], layout=layout1)
iplot(fig1, filename='basic_pie_chart1')
As we can see India's postion in global cargo is 1.56% as of 2015. But It seems that some major changes have happened. Let us compare above last two plots so that we can get some interesting insights.
When we compare India's trading volume in 2000 it was only 1.01% and increased to 1.56% of total world trade volume by 2015. One eye catching part here is that china's trading volume almost doubled from 3.65% in 2000 to 7.9% in 2015 (in the same time limit), whereas Japan's position reduced from 4.36% to 2.79%.
7.What are the top 10 commodities in Indian Import Trade(USD) , 2000 vs 2014?
temp = cn_df[(cn_df.year==2000) & (cn_df.flow=='Import')].sort_values(by="trade_usd", ascending=False).iloc[1:11, :]
trade_2000import = temp.sort_values(by="trade_usd", ascending=True)
trace1 = go.Bar(
x = trade_2000import.trade_usd,
y = trade_2000import.commodity,
marker = dict(color = 'rgba(152, 213, 245, 0.8)'),
orientation = 'h'
)
data = [trace1]
layout = {
'yaxis': {'automargin':True,},
'title': "Top 10 Commodities in India Import Trade (USD), 2000"
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)
temp1 = cn_df[(cn_df.year==2016) & (cn_df.flow=='Import')].sort_values(by="trade_usd", ascending=False).iloc[1:11, :]
trade_2015import = temp1.sort_values(by="trade_usd", ascending=True)
trace1 = go.Bar(
x = trade_2015import.trade_usd,
y = trade_2015import.commodity.tolist(),
marker = dict(color = 'rgba(249, 205, 190, 0.8)'),
orientation = 'h'
)
data = [trace1]
layout = {
# 'xaxis': {'title': 'Trade in USD'},
'yaxis': {'automargin':True,},
'title': "Top 10 Commodities in India Import Trade (USD), 2016"
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)
As the graph shows from 200 to 2015 India's top commodity for import is concentrated on different oils like palm oil, crude, sunflower oil etc.
8.What are the top 10 commodities in Indian export Trade(USD) , 2000 vs 2016?
temp = cn_df[(cn_df.year==2000) & (cn_df.flow=='Export')].sort_values(by="trade_usd", ascending=False).iloc[1:11, :]
trade_2000Export = temp.sort_values(by="trade_usd", ascending=True)
trace1 = go.Bar(
x = trade_2000Export.trade_usd,
y = trade_2000Export.commodity,
marker = dict(color = 'rgba(21, 31, 39, 0.8)'),
orientation = 'h'
)
data = [trace1]
layout = {
# 'xaxis': {'title': 'Trade in USD'},
'yaxis': {'automargin':True,},
'title': "Top 10 Commodities in India Export Trade (USD), 2000"
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)
temp1 = cn_df[(cn_df.year==2016) & (cn_df.flow=='Export')].sort_values(by="trade_usd", ascending=False).iloc[1:11, :]
trade_2015Export = temp1.sort_values(by="trade_usd", ascending=True)
trace1 = go.Bar(
x = trade_2015Export.trade_usd,
y = trade_2015Export.commodity,
marker = dict(color = 'rgba(125, 121, 80, 0.8)'),
orientation = 'h'
)
data = [trace1]
layout = {
# 'xaxis': {'title': 'Trade in USD'},
'yaxis': {'automargin':True,},
'title': "Top 10 Commodities in India Export Trade (USD), 2016"
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)
It seems that, from 2000 to 2016 India's top commodity for export is largely concentrated on raw coffe and tea, Ground nuts, cashew and other spice items.
Here are the conclusion that we could draw about the Global Cargo Market from our Analysis:
1.We discovered about India's market share in the global Market and raise of Indian market share
2.We can say that India is far too behind compared other in Global cargo trading by considering the value of trade, Or in other words we can say that India still has long way to go if it wants to achieve dominance in global cargo market.
3.Despite of seeing many ups and downs, India recovered faster and its growing at a good rate in global cargo trading.
4.When we compare India's trading volume in 2000 it was only 1.01% and increased to 1.56% of total world trade volume by 2015. One eye catching part here is that china's trading volume almost doubled from 3.65% in 2000 to 7.9% in the same time limit.
5.From 2000 to 2015 India's top commodity for import is concentrated on different oils like palm oil, crude, etc. And from 2000 to 2006 top commodity for export is largely concentrated on spice items.
6.Even after the increase in export of goods of India compared to some of the countries like China, India has to grow faster in-order to cach the world cargo trading market and it should use the available opportunity
In future, I would like to improve this project further taking following actions on this dataset
[1] Aakash N S. Analyzing Tabular Data with Pandas. https://jovian.ai/aakashns/python-pandas-data-analysis
[2] Matplotlib Documentation https://matplotlib.org
[3] Stackoverflow https://stackoverflow.com
[4] Folium Documentation http://python-visualization.github.io/folium/
[5] Aakash N S. Data Visualization using Python Matplotlib and Seaborn. https://jovian.ai/aakashns/python-matplotlib-data-visualization
[6] Aakash N S. Advanced Data Analysis Techniques with Python & Pandas. https://jovian.ai/aakashns/advanced-data-analysis-pandas
[7] Aakash N S. Interactive Visualization with Plotly, 2021. https://jovian.ai/aakashns/interactive-visualization-plotly
[8] Aakash N S. plotly-line-chart, 2021. https://jovian.ai/aakashns/plotly-line-chart
[9] Plotly Documentation. https://plotly.com/python/
jovian.commit()
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis