Discover the factors that contribute to national happiness using statistical analysis on a dataset retrieved from Kaggle. Learn how to clean and correlate data using Python.
The goal of this analysis is to determine the factors that correlate to national happiness. I retreived this data from Kaggle. I am going to download the data set. Retrieve it, clean it and see if I can find a correlation between various factors -like income for instance- and happiness.
import seaborn as sns #seaborn the python graphing module for more advanced graphing import matplotlib #matplotlib the python graphing module for more basic graphing import matplotlib.pyplot as plt #matplotlib the python graphing module for more basic graphing import numpy as np #numpy the python module that provide you with math functions and functions for arrays import statsmodels.api as sm #statsmodels.api the python module that contain functions allowing you to apply statistical models %matplotlib inline sns.set_style('darkgrid') matplotlib.rcParams['font.size'] = 14 matplotlib.rcParams['figure.figsize'] = (12, 5) matplotlib.rcParams['figure.facecolor'] = '#00000000' #short cuts to customize the look of plots for graphing
!pip install jovian opendatasets --upgrade --quiet
dataset_url = 'https://www.kaggle.com/unsdsn/world-happiness' #the website url I am downloading from