Walmart Ml Project
Machine Learning for Walmart Forecasting Sales
Purpose of the project
The well accurate forecasting is an essential insight tool for every business. Not just sales are necessary, the production planning, the human resource department, among others. Instead, Machine learning is a highly advanced technique that supplies this requirement, its deep features allow to make complex calculations to perform and narrow the forecasting errors.
This project aims to engage and motivate any person who is interested in machine learning, persuade anybody who fears complex math and their background.
Dataset
The data was taken for an old Kaggle
data competition. It contains 421.570
rows and 16
columns, the objective of the competition is to forecast the Weekly Sales
based on the following inputs:
- Category store (size)
- Type of the store
- USA Department location
- Promotional info
- Date
- Holiday info
- Fuel Price
- Unemployment
- CPI Consumer Price Index
Structure
The data is divided in 5 csv
files:
- Features: This file contains additional data related to the store, department, and regional activity for the given dates
- Test: The same info withheld the
Weekly Sales
- Traing: This is the historical training data, which covers to 2010-02-05 to 2012-11-01.
- Stores: This file contains anonymized information about the 45 stores, indicating the type and size of store.
- Sample: The file to write the final predictions
Essential libraries
Is a machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN.
Offers data structures and operations for manipulating numerical tables and time series.
Is a library that support large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
First let's import the necessary libraries and display setting to make it more visually friendly.
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import matplotlib as plt
import numpy as np
import os
import opendatasets as od
import pandas as pd
import matplotlib
%matplotlib inline
pd.options.display.float_format = '{:,.2f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 150)
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
With the OS
library, created by Jovian
we could download any data kaggle dataset directly. It's necesarry to have your username and the the API
credentials.