How to cover a new retail chain seamlessly?

What business problem is solved by the service under consideration?

OSA Hybrid Platform is a proactive service that solves On-Shelf-Availability problem, i.e. the problem that products, that ideally must be available to consumers, are temporarily unavailable to them at particular stores. Such cases can be caused by numerous reasons:
- In-store staff occasionally forgot to move new batch of products from a storeroom to shelves;
- In-store staff was too busy with other products;
- The only exposed item is damaged or expired;
- There are no price tags near the product, so consumers can not decide whether to buy this product or not;
- It is a failure of supply chain and there are no more items neither at shelves nor at staff-only rooms of a store.

Since a typical supermarket suggests several thousands (or even tens of thousands) different SKUs (Stock Keeping Units), it is unfeasible to monitor their availability manually. Actually, simple software that just counts how many items were put on shelves and how many items were sold, is useless too. The root of the problem is that time of merchandisers costs more than zero, so it is unprofitable to spend it on entering information that a particular quantity of a particular SKU has been added to shelves. As a result, shelf quantity is unknown and arithmetic is not applicable.

More complex solutions are needed. Computer vision and object detection can be used to recognize items from shelf records, but it may be too expensive. Another solution is to analyze real-time data (such as transaction data) and extract indirect signs of out-of-shelf (OOS) cases from there. Then a classification model can decide whether absence of sales for a triple of a particular time interval, particular product, and a particular store is caused by
OOS case or not. When the former is true, alerts can be sent to in-store staff. This is a brief explanation of how OSA Hybrid Platform works and below experience of this platform is used to illustrate some general concepts.

Why you can not just use existing solution if a new retailer comes
Suppose that we have a solution that successfully detects out-of- shelf cases and triggers effective shelf replenishment in several stores of a retail network. Does this necessarily mean that the solution is applicable to other retail networks as well? Alas, the answer is no.

To see why, let us pay attention to differences between various networks. Some factors that affects indirect signs of issues with availability are as follows:
- Intraday and intraweek patterns of consumer traffic. If it is a network of small shops located near residentual buildings, these patterns are not the same as that of a network of large supermarkets that are visited by people arriving by vehicles.
- Size of assortment matrix. Of course, number of SKUs that are absent depends on variability of assortment.
- Average time spent by a consumer during a visit to a store. It is important, because a person can put the last item of a particular SKU to a cart, then continue shopping, and only after hour or two go to checkout. If so, a transaction with the SKU occurs within an interval of OOS which leads to mistaken underestimation of a length of the current time interval without any sales of the product at the store.
- Share of, say, fresh and/or delicatessen and share of, say, detergents. For products that can not be kept long, it is better to have short OOS cases every evening rather than to utilize moderate amounts of rotten or damaged items every day. Conversely, for products that expire within several months or more, there is almost nothing bad if stock exceeds cumulative demand for a pair of weeks. Thus, stock management varies from category to category and shelf replenishment policies vary too.
- Share of SKUs sold by own trademark of a retail network. Supply chain for such SKUs is not similar to that of SKUs provided by external vendors and suppliers.

How do the subtleties listed above affect performance of a model developed and tuned for an environment and applied to another environment? Actually, it is easy to see. Imagine the following example. Suppose that the initial environment is such that shelves are replenished every morning and assortment is narrow enough to guarantee that no products are forgotten.
Also assume that all stores are located on a busy streets. If so, even the most obvious case of OOS can not last more than 24 hours. Further, suppose that the new environment is such that stores are located outside of cities in a places which are visited on weekends. As a consequence, about 80% of turnover is made on Saturdays and Sundays. Now it is obvious why the model can not work well for the second retail chain, is not it? Numerous false alerts are unavoidable on regular weekdays, because length of an interval without any sales can easily become abnormal with respect to patterns that are learned by the model within the initial environment.

Being more scientific, we can say that univariate distributions of important features can be distinct. There are simple methods that can adjust a model to such differences. However, things become more complicated when multivariate distributions are involved. It is possible to have two fundamentally different bivariate distributions such that their univariate projections are identical. To get an example, just consider the following sets of points on a 2D plane:
- (1, 1), (2, 2), (3, 3), (4, 4), (5, 5);
- (1, 5), (2, 4), (3, 3), (4, 2), (5, 1).

The last thing that has not been discussed explicitly is that there is a sort of vicious circle. Given a new retail network that wants to integrate OSA Hybrid Platform into its processes, new model that classifies intervals without any sales either as intervals caused by unavailability or as regular intervals caused by other reasons, can not be created from scratch. Labeled data are required. To gather such data, time and workforce are needed, and to reduce both amount of time to wait and amount of money to spend, a model is needed. The solution is to use temporary models until powerful enough model is developed.

Overview of solutions
Potential solutions can be divided into two groups:
- The ones based on transferring of models trained on existing old data;
- The ones based on effective use of scarce new data.

How to extract from data something that can be generalized to an unseen environment?
In this section, light is shed on exactly one approach to transferring a model to a different environment (in the case under consideration, to roll out a model on a new retail network). The key idea is as follows. If multivariate distribution of some sets of features is changed, this means that objects from the initial environment and objects from the new environment can be separated from each other by a decision hypersurface. Let us train a classifier to do so.

Formal description of the approach looks like this:
- Vertically concatenate feature matrices from both environments. Note that features of objects from the new environment can be extracted from historical data easily, because only target variable is not measured. In case of OSA Hybrid Platform, this means that transactional data, loyalty card data, and so on are available, because almost all retailers collect them. Only intervals without any sales are not labeled due to absence of previous attempts to solve the problem of OOS with machine learning.
- Add a target variable that is equal to 0 if an object comes from the initial environment or is equal to 1 if an object comes from the new environment.
- Train a classifier on the described above dataset. The classifier should be quite similar to the model under consideration.
- Look at feature importances for the original problem and for the problem described above. Exclude top features for the new problem.
- Select a subset of features such that the performance in the original problem is as high as possible, but performance in the described above problem is as low as possible. Optimal balance between these two metrics can be found based on expert judgment.
- Train a model for the original problem using only features from the selected subset. This model can be used in the new environment, because it relies only on similarly distributed features.

How to cope with small amount of labeled data?
There are several approaches to solve this problem:
- Usage of heuristics and human-created rules until more training examples are collected.
- Semi-supervised learning. This leads to attempts to extract useful information from knowledge of features of objects without known target variable.
- Recalibration. If predicted probabilities are crucial for a service (say, alert is sent if predicted probability is higher than a threshold), data from new environment can be used for calibration of probabilities returned by the model trained in the initial environment. Platt calibration or isotonic regression can be used.
- Active learning, i.e., collecting data near the decision boundary of a model and updating the model every time new examples are labeled. Also exploration, i.e., labeling random objects, objects from previously unstudied regions of feature space, and/or outliers, can be included into this scheme.

Active learning approach seems to be powerful and flexible. Let us review it deeper.

Is a training set something immutable and non-expandable? In active learning paradigm, it is not. The training set size can be increased, but, of course, labeling of new examples is not costless. Pool-based setup of active learning assumes that, given a model and a training set, there is also a fixed and known n-element set of initially unlabeled examples and the goal is to select k, k < n, examples from there such that disclosure of their target variables produces the most significant impact on model quality. There are other setups of active learning problems as well (e.g., how to synthesize feature representations of objects to be studied), but they are not suitable for plenty of business problems. For instance, it is impossible to figure out whether a synthetic interval without any sales is caused by OOS or not, because there is no
such interval and so there is nothing subject to physical audit.

To pick from a pool examples that are located near decision boundary of a model, candidates can be ranked by a specially designed score. If a classification problem is studied, below options can be used:
- Classifier confidence, i.e., predicted probability of predicted class;
- Margin of predicted class, i.e., difference between predicted probability of predicted class and predicted probability of the second best class;
- Shannon entropy of predicted class label distribution;
- Sum over all folds of Kullback-Leibler divergence between class label distribution predicted by a classifier trained on all folds except the current one and averaged over all such classifiers class label distribution.

Conclusion
It has been demonstrated that straightforward transfer of a machine learning model from an environment to other environment can yield poor results. Also it has been shown that there are numerous techniques to go over this obstacle. Some of these techniques are discussed in details, whereas other techniques are just mentioned.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
OSA Hybrid Platform

OSA Hybrid Platform

An AI-driven and industry-proven platform that solves on-shelf product availability issues in real time to drive sales and store labor