Learn practical skills, build real-world projects, and advance your career

Forecasting Monthly Future Sales using Machine Learning


About the project

The sale prediction is essential for a business house to enable it to produce the required quantity at the right time. It also helps in overall business planning, risk management and budgeting. Machine learning algorithms help to achieve these things in efficient way.

The aim of the project is predicts total sales for every product and store in the next month with a challenging time-series dataset consisting of daily sales data, kindly provided by one of the largest Russian software firms - 1C Company.


The dataset provided with daily historical sales data. The task is to forecast the total amount of products sold in every shop for the test set. Note that the list of shops and products slightly changes every month.


The dataset contains 6 csv files:

  1. sales_train.csv - the training set. Daily historical data from January 2013 to October 2015.
  2. test.csv - the test set. You need to forecast the sales for these shops and products for November 2015.
  3. sample_submission.csv - a sample submission file in the correct format.
  4. items.csv - supplemental information about the items/products.
  5. item_categories.csv - supplemental information about the items categories.
  6. shops.csv- supplemental information about the shops.

Data Fields:

  • ID - an Id that represents a (Shop, Item) tuple within the test set
  • shop_id - unique identifier of a shop
  • item_id - unique identifier of a product
  • item_category_id - unique identifier of item category
  • item_cnt_day - number of products sold. You are predicting a monthly amount of this measure
  • item_price - current price of an item
  • date - date in format dd/mm/yyyy
  • date_block_num - a consecutive month number, used for convenience. January 2013 is 0, February 2013 is 1,..., October 2015 is 33
  • item_name - name of item
  • shop_name - name of shop
  • item_category_name - name of item category

Here's an outline of the project:

  1. Download the dataset
  2. Explore & analyze the dataset
  3. Prepare the dataset for ML training
  4. Train & evaluate different models
  5. Make predictions & submit to Kaggle
  6. Document & publish the project online
!pip install jovian --upgrade --quiet
import jovian
# Execute this to save new versions of the notebook
[jovian] Detected Colab notebook... [jovian] Please enter your API key ( from https://jovian.ai/ ): API KEY: ·········· [jovian] Uploading colab notebook to Jovian... Committed successfully! https://jovian.ai/saini-9/final-ml-project

1.Download the dataset

  1. Install the required liberaries
  2. Download data from kaggle
  3. Load data files