Jovian
Sign In
Learn practical skills, build real-world projects, and advance your career

Exploratory Data Analysis Case Study - Global Cargo Data

Exploratory Data Analysis Case Study - Global Cargo Data

Imgur

What and Why Exploratory Data Analysis?

Exploratory data analysis(EDA) is used by data analysts/ Scientists to analyze and investigate data and datasets and summarize thier main features, often employing data visualization methods. It helps in understanding the dataset very easily. Helps in manipulating data for further use.

EDA is basically used to see what data can reveal beyond the formal modelling or hypothesis testing task and provides better understanding of data set variables and the relationship between them. Originally developed by American mathematician Jhon Tukey in the 1970s, EDA techniques continue to be a widely used method in the data discovery process today.

EDA can help us deliver great business results, by improving our existing knowledge and can also help in giving out new insights that we might not be aware of

Tools Used

  • opendatasets (Jovian library to download a Kaggle dataset)

  • Data cleaning:

    1.Pandas

    2.Numpy

  • Data Visualization

    1.Matplotlib

    2.Seaborn

    3.plotly

    4.Heatmap

About the Project

In this project, we are trying to analyse Global Cargo Data. This selected dataset covers import and export volumes for 5.000 commodities across most countries on Earth over the last 30 years. Personally i find commodities quite interesting because it helps to analyze not only about income of country but also international behaviours and relations of countries.

Steps followed

Step 1: Selecting a real world dataset:

  • We will download our dataset from Kaggle using the library opendataset created by Jovian which imports the datasets directly from the 'kaggle' website

    import opendatasets as od

    dataset = 'https://www.kaggle.com/ unitednations/global-commodity-trade-statistics' od.download(dataset)

Step 2: Performing data preperation & cleaning

  • We will load the dataset into a dataframe using Pandas, explore the diiferent columns and range of values, handle missing values and incorrect datatypes and basically make our data ready to use for our analysis.

Step 3:Perform exploratory analysis and visualization and asking interesting questions

  • We will compute the mean,sum,range, and other interesting statistics for neumaric columns, explore distributions of neumaric columns using histogram etc, make a note of interesting insights from the exploratory analysis, ask interesting questions about your dataset and look for their answers through visualizing our data.

Step 4: Summarizing inferences & writing a conclusion

  • We will write a summary of what we've learnt from our analysis, share ideas for future work that can be explored in future with this data and shae links to resource we found useful during our analysis.

Imgur

How to Run The Code

option 1 : Running using free online resources (1-click, recommended)

The easiest way to start executing the code is to click the Run button at the top of this page and select Run on Colab. You can also select "Run on Binder" or "Run on Kaggle", but you'll need to create an account on Google Colab or Kaggle to use these platforms. Also, Colab will provide the most memory that will be needed for this project to run.

Option 2: Running on your computer locally

To run the code on your computer locally, you'll need to set up Python, download the notebook and install the required libraries. We recommend using the Conda distribution of Python. Click the Run button at the top of this page, select the Run Locally option, and follow the instructions.

Jupyter Notebooks: This is a Jupyter notebook - a document made of cells. Each cell can contain code written in Python or explanations in plain English. You can execute code cells and view the results, e.g., numbers, messages, graphs, tables, files, etc., instantly within the notebook. Jupyter is a powerful platform for experimentation and analysis. Don't be afraid to mess around with the code & break things - you'll learn a lot by encountering and fixing errors. You can use the "Kernel > Restart & Clear Output" menu option to clear all outputs and start again from the top.

!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
jovian.commit(project="global-cargo-data-analysis")
[jovian] Detected Colab notebook... [jovian] Please enter your API key ( from https://jovian.ai/ ): API KEY: ·········· [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis

Downloading the Dataset

Step 1: We will download the dataset from "https://www.kaggle.com/" using the opendatasets library created by Jovian. So lets begin by downloading the data, and listing the files within the dataset: 'https://www.kaggle.com/unitednations/global-commodity-trade-statistics'

! pip install jovian opendatasets --upgrade --quiet 
dataset = 'https://www.kaggle.com/unitednations/global-commodity-trade-statistics'
import opendatasets as od
od.download(dataset) 
Please provide your Kaggle credentials to download this dataset. Learn more: http://bit.ly/kaggle-creds Your Kaggle username: adityanhebbar Your Kaggle Key: ·········· Downloading global-commodity-trade-statistics.zip to ./global-commodity-trade-statistics
100%|██████████| 121M/121M [00:01<00:00, 124MB/s]

The dataset has been downloaded and extracted.

data_dir = '/content/global-commodity-trade-statistics'
import os
os.listdir(data_dir)
['commodity_trade_statistics_data.csv']

Data preperation and Cleaning

Data cleaning is the process by which we make sure that the data that we are using for our analysis is completely ready. It means we dont ahve any duplicates, or missing values, the data is in the right format, not corrupted and thus ready to be used for analysis. Imgur

import pandas as pd
import numpy as np
trade_data_csv = '/content/global-commodity-trade-statistics/commodity_trade_statistics_data.csv'
trade_df_cur = pd.read_csv(trade_data_csv, low_memory = False, nrows = 1000000)
trade_df_cur.shape #shape functions helps to know about rows and columns in dataset
(1000000, 10)

As we can see we have 10L rows and 10 columns to work with. Ofcourse we cannot work with all of this for what we do today. So let us start working on cleaning this data. i.e. selecting the required data and putting it in the form of our analysis

trade_df_cur.sample(2) #sample() functions shows any two sample rows (since  2 is mentioned here)
trade_df_cur.info() # info() helps to get overview of the dataset
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1000000 entries, 0 to 999999 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 country_or_area 1000000 non-null object 1 year 1000000 non-null int64 2 comm_code 1000000 non-null int64 3 commodity 1000000 non-null object 4 flow 1000000 non-null object 5 trade_usd 1000000 non-null int64 6 weight_kg 992195 non-null float64 7 quantity_name 1000000 non-null object 8 quantity 977353 non-null float64 9 category 1000000 non-null object dtypes: float64(2), int64(3), object(5) memory usage: 76.3+ MB
trade_df_cur.describe() #describe is used to get statistical info about neumarical columns
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(trade_df_cur.dtypes) #we can look into datatypes_ 
country_or_area object year int64 comm_code int64 commodity object flow object trade_usd int64 weight_kg float64 quantity_name object quantity float64 category object dtype: object

As we can see that by default float64 and integer64 bit datatypes are used, though the values that have been used are not that large and can be stored in a 32bit datatype as well.

We can convert thesdse 64bit datatypes to 32 bit so that we can increase the speed and decrease the space the dataset holds.

change_dtypes = {
    'year' : 'int32',
    'comm_code': 'int32',
    'trade_usd' : 'int32',
    'weight_kg':'float32',
    'quantity':'float32'
}
%%time
trade_df_modified = pd.read_csv(trade_data_csv,  low_memory=False, nrows= 1500000,  dtype=change_dtypes)
CPU times: user 2.23 s, sys: 357 ms, total: 2.59 s Wall time: 2.58 s

Now we have finally read 10L rows from the 'trade_data_csv' with our bselected datatypes.

trade_df_modified.head() #shows us the top 5 rows of our final dataframe
trade_df_modified.shape
(1500000, 10)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
    print(trade_df_modified.dtypes) #a look into datatypes_ 
country_or_area object year int32 comm_code int32 commodity object flow object trade_usd int32 weight_kg float32 quantity_name object quantity float32 category object dtype: object

Let us proceed to check for the dupliucate and missing values:

trade_df_modified.duplicated().sum() #duplicated functiuon checks for duplicates 
0

As we can see there are no duplicates in our dataset. Lets try to work around null values if there are any.

trade_df_modified.isnull().sum() #isnull() along with sum() tells us the count for missing values
country_or_area        0
year                   0
comm_code              0
commodity              0
flow                   0
trade_usd              0
weight_kg           9765
quantity_name          0
quantity           24612
category               0
dtype: int64

As we can see above only weight and quantity columns have null values which less in volume when compared to the larger dataset. So we can drop thos rows which do have null values. Options for dealing with missing values in neuamarical columns

1.Leave them as is, if they won't affect our analysis

2.Replace them with average

3.Remove the rows containing missing values.

4.Replace them with fixed values.

5.Use the values from other rows & columns to estimate the missing values to see what we can do about them

jovian.commit()
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
trade_df_modified['weight_kg'].isna().sum()
9765

As we can see there are 9765 missing values in weight_kg column. Let us clean this.

trade_df_modified['weight_kg'].value_counts()
0.0           22656
1.0            5385
10.0           4222
2.0            3841
20.0           3539
              ...  
1120438.0         1
10624018.0        1
1709766.0         1
7565427.0         1
12516685.0        1
Name: weight_kg, Length: 675510, dtype: int64
a = trade_df_modified['weight_kg'].describe()
a.round(1)
count    1.490235e+06
mean     2.077931e+07
std      6.447121e+08
min      0.000000e+00
25%      2.750000e+03
50%      5.763000e+04
75%      9.795410e+05
max      6.144366e+11
Name: weight_kg, dtype: float64

Now we can see that the mean and the median value of the columns are 2.06 and 5.56. Let us replace the Nan values with the values in the same range.

import random
trade_df_modified['weight_kg'].fillna(random.uniform(1.5,6),inplace=True)
#fillna() is used to fill the NaN values of a column 
#random() function generates random variables between the said arguments
a = trade_df_modified['weight_kg'].describe()
a.round(1)
count    1.500000e+06
mean     2.064404e+07
std      6.426099e+08
min      0.000000e+00
25%      2.538000e+03
50%      5.562300e+04
75%      9.599208e+05
max      6.144366e+11
Name: weight_kg, dtype: float64

As we can see the modification didnt largely change the values of mean and median. So we can be confident about our cleaning.

trade_df_modified['weight_kg'].isna().sum()
0
jovian.commit()
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis

Now let us try to clean another last column with nill values.That is quantity column by the above method.

trade_df_modified['quantity'].isna().sum()
24612
trade_df_modified['quantity'].value_counts()
0.0           16988
1.0            6040
10.0           4304
2.0            4300
3.0            3627
              ...  
10108742.0        1
494928.0          1
26591968.0        1
9127636.0         1
12516685.0        1
Name: quantity, Length: 672935, dtype: int64
a = trade_df_modified['quantity'].describe()
a.round(1)
count    1.475388e+06
mean     3.589867e+07
std      6.316239e+09
min      0.000000e+00
25%      2.719000e+03
50%      5.840950e+04
75%      1.005000e+06
max      5.372075e+12
Name: quantity, dtype: float64

Here also, since the mean and median values are quuite close we can replace them with value of relevent range.

import random
trade_df_modified['quantity'].fillna(random.uniform(1.5,6),inplace=True)
#fillna() is used to fill the NaN values of a column 
#random() function generates random variables between the said arguments
a = trade_df_modified['quantity'].describe()
a.round(1)
count    1.500000e+06
mean     3.530964e+07
std      6.264210e+09
min      0.000000e+00
25%      2.225000e+03
50%      5.336400e+04
75%      9.547470e+05
max      5.372075e+12
Name: quantity, dtype: float64

As we can see the modification didnt largely change the values of mean and median. So we can say that modification didnt change anything largely.

trade_df_modified['quantity'].isna().sum()
0
trade_df_modified.isnull().sum() #isnull() along with sum() tells us the count for missing values
country_or_area    0
year               0
comm_code          0
commodity          0
flow               0
trade_usd          0
weight_kg          0
quantity_name      0
quantity           0
category           0
dtype: int64

As you can see there are no null values in the dataframe

jovian.commit()
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis

Description of the columns

  • country_or_area: country name of record
  • year: year in which the trade has taken place
  • comm_code : the harmonized and coding system generally referred
  • commodity:description of a particular commodity code
  • flow:flow of trade i.e. export,import, others
  • trade_usd: value of the trade in usd
  • weight_kg: weight of the commodity in kilogram
  • quantity_name: description of the quantity measurement type given the type of item (i.e. number of items, weights in,etc)
  • quantity: count of the quantity of a given item based of the quantity name
  • category: category to identify commodity

Exploratory Analysis and Visualization

Here, let us try to understand the dataset better with visualization. Which will also help us to have answers to some interesting questions delivering meaningful insights into data Imgur

final_df = trade_df_modified
final_df.shape
(1500000, 10)
final_df.sample(3)
jovian.commit() #save latest version of the notebook
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis

Lets begin by importing seaborn, matplotlib and plotly. These are the three visualization libraries that we will be using to visualize our data.

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 15
matplotlib.rcParams['figure.figsize'] = (8, 4)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
from mpl_toolkits.mplot3d import Axes3D  #light weight solution for 3D visualization
# Execute this to save new versions of the notebook
jovian.commit(project="global-cargo-data-analysis")
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
Let us try to understand the correlation between neumarical values if there are any
final_df.corr()
f, ax = plt.subplots(figsize = (13,13))
p = sns.heatmap(final_df.corr(),annot = True, linewidth = 5 ,fmt = ' .1f', ax = ax);
p.set_title("Relation between neuamarical values");
Notebook Image

As we can n see itsrlf in the plot, there are no correlation between neuamrical columns in our dataset.

Let us see how much data we have gatherd over the years, So that we can have an idea about timeline and quantity of data

final_df.year.plot(kind = 'hist', bins = 100, figsize =(10,10))
plt.xlabel("YEAR")
plt.show()
Notebook Image

It looks like the dataset has gained more data from 2000 to 2015. Probably the internet revolution helped to get relevent data more easily. But reduced data at the end may be manual entry error nothing to do with actual cargo transportation.

# Execute this to save new versions of the notebook
jovian.commit(project="global-cargo-data-analysis")
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis

Lets create a Choropleth map to better visualize the relative responses from various countries

import folium
countries_geojson = 'https://raw.githubusercontent.com/johan/world.geo.json/master/countries.geo.json'
country_counts = final_df.country_or_area.value_counts()
country_counts_df = pd.DataFrame({ 'Country': country_counts.index, 'Count': country_counts.values})
country_counts_df
country_counts_df.at[0, 'Country'] = 'United States of America'
country_counts_df.at[12, 'Country'] = 'Russia'
country_counts_df.isnull().sum()
Country    0
Count      0
dtype: int64
m = folium.Map(location=[30, 0], zoom_start=2)

folium.Choropleth(
    geo_data=countries_geojson,
    data=country_counts_df,
    columns=["Country", "Count"],
    key_on="feature.properties.name",
    threshold_scale=[0, 5000, 10_000,15_000, 20_000,25_000, 30_000,35_000, 40_000],
    fill_color="OrRd",
    fill_opacity=0.7,
    line_opacity=0.2,
    legend_name="Respondents",
).add_to(m)

m

Now, with the above map we can clearly understand about how best the countries have responded while collecting the global cargo data. It also gives another insight here that USA is handling its data collection strategy very well compared to other countries.

So, with all these insights about data let us move on to question asking answering section.

Asking and Answering Questions

1.What is the stat of import and export of India according to this data??

cn_df = final_df[final_df.country_or_area == 'India']
cn_df.head()
cn_i = cn_df[(cn_df.flow == 'Import') & (cn_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
cn_e = cn_df[(cn_df.flow == 'Export') & (cn_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')

trace1 = go.Bar(
                x = cn_i.year,
                y = cn_i.trade_usd,
                name = "India Import",
                marker = dict(color = 'rgba(102, 216, 137, 0.8)'),
)
trace2 = go.Bar(
               x = cn_e.year,
                y = cn_e.trade_usd,
                name = "India Export",
                marker = dict(color = 'rgba(224, 148, 215, 0.8)'),
)
data = [trace1, trace2]
layout = {
    'xaxis': {'title': 'Year 1992-2016'},
    'yaxis': {'title': 'Trade of Import & Export in India (USD)'},
    'barmode': 'group',
    'title': 'Import and Export in India'
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)

So, we can say that Over the last 30 Years India's import and export has increased significantly. Although it got reduced in 2009 may be due to global market crash it recovered Instantly. Another important thing to notice here is that increase in India's export Value increasing year by year.

2.What is the Value of Trade of India while comparing total world trade all these years according to the data which is available now??

cn_trade = cn_df[cn_df.comm_code!= 'TOTAL'].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
wd_trade = final_df[(final_df.year >1991) & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
# cn_trade.shape

trace0 = {
    'x': cn_trade.year,
    'y': cn_trade.trade_usd,
    'name': "India",
    'type': 'bar',
    'marker': {'color':'rgba(255, 239, 208)'}
}
trace1 = {
    'x': wd_trade.year,
    'y': wd_trade.trade_usd,
    'name': "World",
    'type': 'bar',
    'marker': {'color':'rgba(255, 171, 202, 0.8)'}
}
data = [trace0, trace1]
layout = {
    'xaxis': {'title': 'Year 1992-2016'},
    'yaxis': {'title': 'Value of Trade in USD'},
    'barmode': 'relative',
    'title': 'World vs India: Value of Trade'
};
fig = go.Figure(data = data, layout = layout)
iplot(fig)

By looking at the above graph, we can say that even when India's presecnce in Global cargo marketing is increasing when we compare it to whole world, its far too lesser.

3.What is the Percentage of India Trade in World Trade in % ??

# ratio
trace3 = go.Scatter(
                    x = cn_trade.year,
                    y = cn_trade.trade_usd/wd_trade.trade_usd*100,
                    mode = "lines+markers",
                    name = "Ratio of China/World",
                    marker = dict(color = 'rgba(245, 150, 104, 0.8)')
)
data2 = [trace3]
layout2 = dict(title = 'Percentage of India Trade in World Trade (%)',
              xaxis= dict(title= 'Year 1992-2016',ticklen= 5,zeroline= False),
              yaxis = {'title': 'Percentage (%)'}
 )
fig2 = dict(data = data2, layout = layout2)
iplot(fig2)

It appears that, although India's presecnce in market was increasing around 1990 to 1995. it got reduced suddenly. The reason could be Indo-Pak war.And then India tried to recover but again faces drawback from 2009 recession. But suddenly after that India recoverd soon

# Execute this to save new versions of the notebook
jovian.commit(project="global-cargo-data-analysis")
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis

4.Which are the top 15 countries in trading when we consider Volume?

top_countries = final_df.country_or_area.value_counts().head(15)
top_countries
Canada                  39623
Australia               33813
China, Hong Kong SAR    33126
Denmark                 28831
China                   26934
France                  25943
Brazil                  24230
Czech Rep.              24140
Austria                 23169
Finland                 22084
Chile                   21975
Croatia                 21541
Cyprus                  21186
Germany                 20743
Argentina               20116
Name: country_or_area, dtype: int64
plt.figure(figsize=(12,6))
plt.xticks(rotation=75)
plt.title("Top 15 active countries in Cargo Trading")
sns.barplot(x=top_countries.index, y=top_countries);
Notebook Image

As you can see these are the top 15 countries in trading when we consider volume as criteria.

5.What is the postion of India in Trading in 2000 ?

USA_trade = final_df[(final_df.country_or_area == "USA") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
JAPAN_trade =final_df[(final_df.country_or_area == "Japan") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
CHINA_trade = final_df[(final_df.country_or_area == "China") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
INDIA_trade =final_df[(final_df.country_or_area == "India") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
EUR_trade = final_df[(final_df.country_or_area == "EU-28") & (final_df.comm_code!= 'TOTAL')].groupby(['year'],as_index=False)['trade_usd'].agg('sum')
EUR_2000 = int(EUR_trade[EUR_trade.year==2000].iloc[0][1])
USA_2000 = int(USA_trade[USA_trade.year==2000].iloc[0][1])
JAP_2000 = int(JAPAN_trade[JAPAN_trade.year==2000].iloc[0][1])
CHINA_2000 = int(CHINA_trade[CHINA_trade.year==2000].iloc[0][1])
INDIA_2000 = int(INDIA_trade[INDIA_trade.year==2000].iloc[0][1])
ot_2000 = int(wd_trade[wd_trade.year==2000].iloc[0][1]) - EUR_2000 - USA_2000 - JAP_2000 - CHINA_2000 - INDIA_2000

EUR_2015 = int(EUR_trade[EUR_trade.year==2015].iloc[0][1])
USA_2015 = int(USA_trade[USA_trade.year==2015].iloc[0][1])
JAP_2015 = int(JAPAN_trade[JAPAN_trade.year==2015].iloc[0][1])
CHINA_2015 = int(CHINA_trade[CHINA_trade.year==2015].iloc[0][1])
INDIA_2015 = int(INDIA_trade[INDIA_trade.year==2015].iloc[0][1])
ot_2015 = int(wd_trade[wd_trade.year==2015].iloc[0][1]) - EUR_2015 - USA_2015 - JAP_2015 - CHINA_2015 - INDIA_2015
labels = ['Europe','USA','Japan','China','India','Others']
colors = ['#f18285', '#86e48f', '#e8a2d8', '#fff76e','#47B39C','#FFC154']

#####
trace = go.Pie(labels=labels, values=[EUR_2000, USA_2000, JAP_2000, CHINA_2000, INDIA_2000, ot_2000],
               marker=dict(colors=colors,  line=dict(color='#000', width=2)) )
layout = go.Layout(
    title='2000 Import & Export Trade in USD',
)
fig = go.Figure(data=[trace], layout=layout)
iplot(fig, filename='basic_pie_chart')

As you can see India's position in global trading in 2000 is only about 1.01% of global market.Europe, China and Japan have biggest position.

6.What is the postion of India in Trading in 2015 ?

trace1 = go.Pie(labels=labels, values=[EUR_2015, USA_2015, JAP_2015, CHINA_2015, INDIA_2015, ot_2015],
               marker=dict(colors=colors,  line=dict(color='#000', width=2)) )
layout1 = go.Layout(
    title='2015 Import & Export Trade in USD',
)

fig1 = go.Figure(data=[trace1], layout=layout1)
iplot(fig1, filename='basic_pie_chart1')

As we can see India's postion in global cargo is 1.56% as of 2015. But It seems that some major changes have happened. Let us compare above last two plots so that we can get some interesting insights.

When we compare India's trading volume in 2000 it was only 1.01% and increased to 1.56% of total world trade volume by 2015. One eye catching part here is that china's trading volume almost doubled from 3.65% in 2000 to 7.9% in 2015 (in the same time limit), whereas Japan's position reduced from 4.36% to 2.79%.

7.What are the top 10 commodities in Indian Import Trade(USD) , 2000 vs 2014?

temp = cn_df[(cn_df.year==2000) & (cn_df.flow=='Import')].sort_values(by="trade_usd",  ascending=False).iloc[1:11, :]
trade_2000import = temp.sort_values(by="trade_usd",  ascending=True)
trace1 = go.Bar(
                x = trade_2000import.trade_usd,
                y = trade_2000import.commodity,
                marker = dict(color = 'rgba(152, 213, 245, 0.8)'),
                orientation = 'h'
)


data = [trace1]
layout = {
    'yaxis': {'automargin':True,},
    'title': "Top 10 Commodities in India Import Trade (USD), 2000"
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)

temp1 = cn_df[(cn_df.year==2016) & (cn_df.flow=='Import')].sort_values(by="trade_usd",  ascending=False).iloc[1:11, :]
trade_2015import = temp1.sort_values(by="trade_usd",  ascending=True)
trace1 = go.Bar(
                x = trade_2015import.trade_usd,
                y = trade_2015import.commodity.tolist(),
                marker = dict(color = 'rgba(249, 205, 190, 0.8)'),
                orientation = 'h'
)
data = [trace1]
layout = {
#  'xaxis': {'title': 'Trade in USD'},
    'yaxis': {'automargin':True,},
    'title': "Top 10 Commodities in India Import Trade (USD), 2016"
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)

As the graph shows from 200 to 2015 India's top commodity for import is concentrated on different oils like palm oil, crude, sunflower oil etc.

8.What are the top 10 commodities in Indian export Trade(USD) , 2000 vs 2016?

temp = cn_df[(cn_df.year==2000) & (cn_df.flow=='Export')].sort_values(by="trade_usd",  ascending=False).iloc[1:11, :]
trade_2000Export = temp.sort_values(by="trade_usd",  ascending=True)
trace1 = go.Bar(
                x = trade_2000Export.trade_usd,
                y = trade_2000Export.commodity,
                marker = dict(color = 'rgba(21, 31, 39, 0.8)'),
                orientation = 'h'
)


data = [trace1]
layout = {
#     'xaxis': {'title': 'Trade in USD'},
    'yaxis': {'automargin':True,},
    'title': "Top 10 Commodities in India Export Trade (USD), 2000"
    }
fig = go.Figure(data = data, layout = layout)
iplot(fig)

temp1 = cn_df[(cn_df.year==2016) & (cn_df.flow=='Export')].sort_values(by="trade_usd",  ascending=False).iloc[1:11, :]
trade_2015Export = temp1.sort_values(by="trade_usd",  ascending=True)
trace1 = go.Bar(
                x = trade_2015Export.trade_usd,
                y = trade_2015Export.commodity,
                marker = dict(color = 'rgba(125, 121, 80, 0.8)'),
                orientation = 'h'
)


data = [trace1]
layout = {
#     'xaxis': {'title': 'Trade in USD'},
    'yaxis': {'automargin':True,},
    'title': "Top 10 Commodities in India Export Trade (USD), 2016"
}
fig = go.Figure(data = data, layout = layout)
iplot(fig)

It seems that, from 2000 to 2016 India's top commodity for export is largely concentrated on raw coffe and tea, Ground nuts, cashew and other spice items.

Inference and conclusion

Here are the conclusion that we could draw about the Global Cargo Market from our Analysis:

1.We discovered about India's market share in the global Market and raise of Indian market share

2.We can say that India is far too behind compared other in Global cargo trading by considering the value of trade, Or in other words we can say that India still has long way to go if it wants to achieve dominance in global cargo market.

3.Despite of seeing many ups and downs, India recovered faster and its growing at a good rate in global cargo trading.

4.When we compare India's trading volume in 2000 it was only 1.01% and increased to 1.56% of total world trade volume by 2015. One eye catching part here is that china's trading volume almost doubled from 3.65% in 2000 to 7.9% in the same time limit.

5.From 2000 to 2015 India's top commodity for import is concentrated on different oils like palm oil, crude, etc. And from 2000 to 2006 top commodity for export is largely concentrated on spice items.

6.Even after the increase in export of goods of India compared to some of the countries like China, India has to grow faster in-order to cach the world cargo trading market and it should use the available opportunity

Future Work

In future, I would like to improve this project further taking following actions on this dataset

  1. Analyse more and different columns from the dataset to derive some more result
  2. Asking more questions, related to some of the specific commodities
  3. Visualizing ansswers to some more questions.
  4. Using volume of the specific commodity by specific country to know about global major distributor of specific commodity/goods.

References


[1] Aakash N S. Analyzing Tabular Data with Pandas. https://jovian.ai/aakashns/python-pandas-data-analysis

[2] Matplotlib Documentation https://matplotlib.org

[3] Stackoverflow https://stackoverflow.com

[4] Folium Documentation http://python-visualization.github.io/folium/

[5] Aakash N S. Data Visualization using Python Matplotlib and Seaborn. https://jovian.ai/aakashns/python-matplotlib-data-visualization

[6] Aakash N S. Advanced Data Analysis Techniques with Python & Pandas. https://jovian.ai/aakashns/advanced-data-analysis-pandas

[7] Aakash N S. Interactive Visualization with Plotly, 2021. https://jovian.ai/aakashns/interactive-visualization-plotly

[8] Aakash N S. plotly-line-chart, 2021. https://jovian.ai/aakashns/plotly-line-chart

[9] Plotly Documentation. https://plotly.com/python/

jovian.commit()
[jovian] Detected Colab notebook... [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/adityahebbarnhnm/global-cargo-data-analysis
adityahebbarnhnm
Aditya Hebbara year ago