Weather Based Stock Prediction with Pycaret

Soumi Bardhan
Analytics Vidhya
Published in
7 min readMay 27, 2020
pycaret.org

Real-time and even predictive out of stock in retail stores in a cost-effective manner by predicting the quantity of sales of each product in that month.

Since competition is increasing day by day among retailers at the market, companies are focusing more predictive analytics techniques in order to decrease their costs and increase their productivity and profit.

Problems:

  • Excessive stocks (overstock) and out- of-stock (stockouts) are very serious problems for retailers.
  • Sales and customer loss is a critical problem for retailers. Considering competition and financial constraints in the retail industry, it is very crucial to have an accurate demand forecasting and inventory control system for management of effective operations.

Here comes predictive analytics where the out-of-stock predictions are more accurate.

Predictive analytics is the utilization of information, factual calculations and machine learning methods to distinguish the probability of future results in light of chronicled information.

  • Weather plays a crucial role in the accurate predictiveness of the stock.
  • Results of studies show that weather has significant effect on store traffic and sales of many product categories and store types.

We looked up datasets on Kaggle with location data available so that we could get the weather data for that location for the respective dates.

The dataset we have is data of sale of various FMCG goods of a SuperMarket located in a city in Poland with a population of 30000 people.

  • The store is located in the prime locality of the city and offer various products like general food-and basic chemistry, hygienic articles, fresh bread,sweets, local vegetables, dairy, etc.
  • The nearest competition is a small grocery store and sells the same products as this SuperMarket. The area of the shop is 120m² and was opened in 2009.
  • The Data deals with products sold in a time period of 12 months over various weather conditions for each particular month. It has 13000 entries ranging over various product descriptions like price, product number, quantity sold etc.

Here’s the link to Kaggle dataset SELL_1.csv.

Exploratory Data Analysis of dataset

First we analyse the data to understand its structure. Below is my notebook.however it can be done in a lot more ways!

We a parameter separately to check whether that day is a weekday or holiday. This affects the sales predictions significantly.

https://www.worldweatheronline.com/

Weather Data API- WWO-hist

  • We use the wwo-hist historical weather data retrieval API.
  • It is an API developed by World Weather Online and gives the weather data for a specific location from a start date to end date.
  • The API call downloads a csv of date and their respective weather conditions like temperature, precipitation, cloudcover and a lot more.
  • We merge the obtained csv date wise to our dataset and again visualise the data to analyse the corelations between weather data and sales.

Weather data csv for dates merged datasets

The average of weather data for each month is calculated. This creates Final_Weather_Data.csv.

Next, final2.csv is analysed. Every product group with quantity under 200 is grouped together into one single group. This reduced the number of product groups from 34 to 22.

Next,we remove the weather data from final2.csv. We create a new dataframe weather_df to read from Final_Weather_Data.csv. We merge these two dataframes to create MERGED_FINAL_DATASET.csv.

We divide the whole dataset into 22 different datasets according to the different product groups.

Here we used MERGED_FINAL_DATASET.csv. We created a dictionary for each group in the dataset. The keys were each group name. Then we converted each separate dataframe to a csv file with name as GROUP.csv. Below is the link to the jupyter notebook. This creates the folder GROUP_OF_DATASETS. It contains 22 csv files.

PYCARET TO TRAIN SEVERAL REGRESSION MODELS

Pycaret makes training models very easy, just one function, compare_models() after setting up the data and your job is done!

We train several regression models with pycaret with a single function.

They include:

  • Random Forest
  • Extreme Gradient boosting
  • CatBoost Regressor
  • Ridge Regression
  • Bayesian Ridge
  • Linear Regression
  • Random Sample consensus
  • Orthogonal Matching Pursuit
  • Lasso Regression
  • Extra trees Regessor
  • Elastic Net
  • K neighbors Regressor
  • Support vector machine
  • Decision Tree
  • AdaBoost Regressor
  • Lasso Least Angle Regression
  • Gradient Boosting regressor
image by author

Next, the Datasets have been grouped together by the names of the models they perform best for.

Here are the names of the best performing algorithms and their respective datasets in GROUP_OF_MERGED_DATASETS.

Extreme Gradient Boosting — EXTREME_GB.csv

COFFEE TEA ,CIGARETTES, CHIPS_FLAKES , ICE_CREAMS_FROZEN, POULTRY, SWEETS

Gradient Boosting- GB.csv

CHEWING_GUM_LOLIPOPS ,BREAD, GENERAL ,GENERAL_FOOD, KETCH_CONCETRATE_MUSTARD_MAJO_HORSERADISH ,SPICES ,ALCOHOL

Adaboost- ADAB.csv

GROATS_RICE_PASTA , OCCASIONAL

Random Forest-RF.csv

CHEMISTRY, DAIRY_CHESSE , GENERAL_ITEMS, VEGETABLES

Catboost-CATB.csv

DAIRY_CHEESE

These groups are separated based on the best model that suits each group. Then the collected groups are cleaned and saved in the GROUP_OF_MERGED_DATASETS.

This folder contains the important data needed for training the models.

The model_training.ipynb file contains the way we have trained the models for the grouped data.

General item units cogs.ipynb creates GROUP_OF_ITEMS_FINAL.

This will be used in the inference file.

GB.csv as the dataset here for one of the five models. Tuning and Boosting is done in model_training.ipynb.

Inference file

The inference file is to test the output on one set of input data.

  • First the user is asked to input the month and year.
  • The month is converted to the format required for wwo-hist.
  • The startdate is the date entered and the end date is calculated as the end of the month. First it is checked whether the year is a leap year or not and how many days that month has.
  • Monthly weather data stored in Mlawa.csv is processed using the finalweather function to calculate means of all the weather data.
  • Appropriate model is loaded based on the group name entered.
  • Quantity is predicted for each item in the group.

Next the loading.py file for app.py, where instead of printing the names and respective quantities, we store it in a string with new line characters to be passed while rendering our template index.html as the prediction_text.

Flask app for API Interface:

  • The app.py file uses the logic from loading.py. It returns a string as the prediction text.
  • The index.html file is used to visualise the result of the API. We enter the date and group.
  • It returns the product name and respective quantity predicted for each item in that group.

Heroku Deployment:

https://www.fullstackpython.com/heroku.html
  • Once the app runs successfully on localhost:5000, we navigate to Heroku Dashboard.
  • An app called yogatf is created.
  • Next the github repo is connected to the app and the master branch is deployed. Its important that the slug size stays under 500mb, otherwise the cache would have to be cleared using Heroku-repo.

REAL WORLD SCENARIO

Benefits

  • It’s better to utilize real-time information to move stock where it’s required before it’s past the point of no return.
  • Furthermore, utilizing predictive analytics to choose what to stock and where in light of information about provincial contrasts in inclinations, climate, and so on.
  • Retail has moved toward becoming as much about envisioning clients’ needs as it is about just stocking decent items.
image by author

TEST IT LIVE!

  • Our web app is live.
  • Enter the date in this format : Eg Feb 2020 should be entered as 02–2020
  • Enter the category from this list of mentioned groups .
  • The web application outputs the names and predicted quantites for every product in that group.

Click Here to test it live!

Please leave a star if you found it useful! Thanks!

--

--