Jovian
Sign In

Walmart Ml Project

Machine Learning for Walmart Forecasting Sales

alt

Purpose of the project

The well accurate forecasting is an essential insight tool for every business. Not just sales are necessary, the production planning, the human resource department, among others. Instead, Machine learning is a highly advanced technique that supplies this requirement, its deep features allow to make complex calculations to perform and narrow the forecasting errors.

This project aims to engage and motivate any person who is interested in machine learning, persuade anybody who fears complex math and their background.

Dataset

The data was taken for an old Kaggle data competition. It contains 421.570 rows and 16 columns, the objective of the competition is to forecast the Weekly Sales based on the following inputs:

  • Category store (size)
  • Type of the store
  • USA Department location
  • Promotional info
  • Date
  • Holiday info
  • Fuel Price
  • Unemployment
  • CPI Consumer Price Index

Structure

The data is divided in 5 csv files:

  1. Features: This file contains additional data related to the store, department, and regional activity for the given dates
  2. Test: The same info withheld the Weekly Sales
  3. Traing: This is the historical training data, which covers to 2010-02-05 to 2012-11-01.
  4. Stores: This file contains anonymized information about the 45 stores, indicating the type and size of store.
  5. Sample: The file to write the final predictions

Essential libraries

ScikitLearn

Is a machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN.

Pandas

Offers data structures and operations for manipulating numerical tables and time series.

Numpy

Is a library that support large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

First let's import the necessary libraries and display setting to make it more visually friendly.

from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

import matplotlib as plt
import numpy as np
import os
import opendatasets as od
import pandas as pd
import matplotlib
%matplotlib inline

pd.options.display.float_format = '{:,.2f}'.format
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 150)
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (10, 6)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

With the OS library, created by Jovian we could download any data kaggle dataset directly. It's necesarry to have your username and the the API credentials.