Learn practical skills, build real-world projects, and advance your career

Forecasting Walmart Weekly Sales using Machine Learning


Open in Google Colab and click the "Run" button to execute the code.

Walmart is an American Retail, Wholesale and E-commerce business. Sam Walton founded Walmart in 1962 in Rogers, Arkansas. His goal was to help people "Save Money and Live Better" which continues to be Walmart's guiding mission with "Every Day Low Prices(EDLC)" and great service.

  • Number 1 ranked Fortune 500 company with total revenue of $559 billion as of Jan 31 2021.
  • Publicly listed company, 'WMT' on the New York Stock Exchange(NYSE)
  • Each week 220 million customers visit 10,500 stores and clubs under 48 banners in 24 countries and eCommerce websites
  • Largest private employer in the world with more than 2.3 million people employed around the world

Revenue and operations of the company are categorised into three key business segments - Walmart US, Walmart International and Sam's Club.

They operate in three store types or store formats as below.

  • Supercenters (general merchandise and grocery, Average store size 178,000 square feet)
  • Discount stores (general merchandise and limited grocery, Average size 106,000 square feet)
  • Neighbourhood stores (grocery, Average size 42, 000 square feet)

The business problem is to forecast weekly store sales for Walmart.

Business problem statement

  1. Predicting department-wide weekly sales for each Walmart store
  2. Predict which departments are affected and the extent of the impact due to holiday markdowns based on limited history

Evaluation criteria and loss functions
WMAE - weighted mean absolute error

WMAE=1Σwii=1+nwiyiy^iWMAE = \frac{1}{\Sigma w_i} * \sum_{i=1}^{+n} w_i * | y_i - \hat{y}_i|


n is the number of rows
\( \hat{y}_i \) is the predicted sales
\( y_i \) is the actual sales
\( w_i \) are weights. w = 5 if the week is a holiday week, 1 otherwise

In this notebook we will explore Supervised Machine Learning methods. Regression models such as linear regression, decision tree and ensemble models such as RandomForest, XGBoost, LightGBM will trained to predict weekly sales using Scikit Learn, LightGBM and XGBoost. We will use Pandas, Numpy, Matplotlib, Seaborn and Plotly to perform exploratory data analysis and gather insights for machine learning. We will do the following

  • Install and Import libraries
  • Explore the dataset and merge different files as required
  • Translate the business problem to a machine learning problem
  • EDA - exploratory data analysis
  • Feature Engineering
  • Data preparation - Train Val Split, Encoding, Imputing and Scaling
  • Select input features
  • Define evaluation metrics
  • Define baseline model
  • Select best model without hyperparameter tuning
  • Hyperparameter tuning for select models
  • Make predictions
  • Save the best model
  • Summarise insights and learning