How adidas and AWS optimised supply chain management to satisfy customer demand through Machine Learning and MLOps on Amazon SageMaker

Jana van Leuven
adidoescode
Published in
11 min readOct 17, 2023

authors: Jana van Leuven, Oliver, Viktor Malesevic, Daniel Krumpholz, Sidney Machill, Eugene Teoh

Abstract

adidas’ Demand Forecasting team is currently building a forecasting engine to accurately forecast customer demand for eCom, with the purpose of increasing customer satisfaction and maximising sales through optimal supply chain management. To achieve this goal, a team of data scientists was set up to build a machine learning (ML) use-case at scale.

In collaboration with AWS Professional Services, we created a state-of-the-art Demand Forecasting system that leverages Amazon SageMaker Pipelines and many other AWS Services. In this blog, we describe the various challenges we faced and how we solved them. We provide details on the solution architecture and discuss how AWS platform capabilities such as SageMaker enabled us to innovate and scale machine learning models. With the new Demand Forecasting system in production, we can now produce weekly forecasts for thousands of articles in an automated manner. These forecasts enable our stakeholders to optimise stock levels across various article groups and sales channels. Moreover, the automated pipeline is easy to scale to other use-cases and can be leveraged for other (forecasting) initiatives within adidas.

Introduction

Every fashion season, adidas launches a plethora of different articles. Managing such a large offering comes with challenges and tough decision-making. To name a few: how do we decide on optimal stock levels? How can we reduce shelf life? How to avoid over-stocking? What if an article’s sales suddenly pick up and stock is running low? To tackle such demand and supply hitches, the need for an accurate and efficient demand forecasting engine is evident. So, the goal of the demand forecasting project is to build exactly that: a model that can produce accurate article-level forecasts across adidas’ full article-base, with the goal of meeting customer demand without overshooting stock levels in the warehouses. Running this model in an automated way is inevitable, as these forecasts need to be delivered frequently and at a large scale. Next to that, the machine learning model should be capable of capturing complex dynamics of demand signals for many different articles. By leveraging AWS capabilities, combined with a Deep Learning forecasting model, we are able to tackle such challenges.

In-season demand forecasting engine

The goal of in-season demand forecasting is to provide article-level, weekly demand forecasts for the season that is currently happening (hence the name “in-season” forecasting). Forecasts should be available for all in-range articles and should be updated every week to incorporate the latest information on customer demand and stock levels into the forecasts. adidas’ article-base consists for a large part of brand-new articles that have little or no historical sales data. This problem is known as the cold start problem. Neural network-based algorithms can overcome this problem by learning typical behavior of new articles’ sales based on the sales patterns of other types of articles when they were first released. By learning relationships from multiple related time series, more accurate forecasts can be provided. It was therefore decided that a deep learning-based forecasting model is the best candidate for our use-case.

Deep learning for large scale demand forecasting

Deep learning is a sub-field of machine learning that involves the study of deep neural networks. Deep learning-based forecasting, or neural forecasting, leverages deep neural networks to predict time series into the future. Neural forecasting models require less feature engineering as compared to classical models such as ARIMA and ETS. There are two types of neural forecasting models: local and global models. Local models are trained for each time series, whereas global models are trained across multiple related time series. In our use-case, each article has its own time series with historical sales data. We can use global models to learn non-linear patterns across these time series. Global models can also address the cold start problem since it uses learned patterns, also known as embeddings, to produce forecasts. In addition, neural models can handle complex data that is non-normal and multimodal. They also scale well to big data and have proven state-of-the art performance in highly complex systems.

DeepAR, an autoregressive model based on a recurrent neural network, demonstrated the best performance for our use-case. DeepAR is a type of neural model for forecasting univariate time series. It trains a global model which accepts thousands of time series as input during training. This allows the model to use contextual information from related time series to perform forecasting, and notably, for cold start articles. On a technical aspect, DeepAR supports fast training through GPU using the PyTorch Forecasting framework. This speeds up the training significantly.

Another benefit is that DeepAR produces probabilistic forecasts. This assists the end-user with optimising future stock levels. Instead of pinpointing one outcome as “the best” forecast, DeepAR produces a set of possible outcomes, each with a certain probability. One can use forecasting intervals of the model’s probability distribution to emphasise over-prediction. Over-prediction may be useful when the margins of the articles you are predicting for are high. If margins are high, one would rather make the sale at the cost of having slightly more stock than risking a missed sale. By selecting a higher forecasted demand value from the probability distribution, the likelihood of missing a sale is decreased. In this way, over-prediction serves as a safety buffer to meet future demand.

In our use-case, we decided to go for the DeepAR model from the PyTorch Forecasting library. After many model iterations and improvements, we landed on a configured model that met our performance expectations. Let’s look at this model’s performance for a few different articles by comparing actual demand values to the model’s forecasts for a few sample articles (Figure 1–3). Note that for confidentiality reasons, the axis ticks and article names are hidden. The graphs below show the weekly ground truth demand values (yellow line) and DeepAR’s forecasted demand values (blue line) for a three different articles. One can see that even though article-level demand fluctuates considerably week by week, the model is able to capture these dynamics pretty well.

Figure 1 — Ground truth vs. DeepAR predictions for article A
Figure 2 — Ground truth vs. DeepAR predictions for article B
Figure 3 — Ground truth vs. DeepAR predictions for article C

Across the full article base, the improved DeepAR model translated into a reduction of the over-prediction error by 40 percent points, compared to the baseline model that our stakeholders were using for buying and inventory planning up to that point. Switching to the sophisticated DeepAR model means that unnecessary excess inventory can be drastically reduced, thanks to more accurate forecasts.

MLOps and cloud architecture

One of the key requirements of this project was that the forecasting solution could run frequently without human interaction. The tremendous amount of sales data flowing in every day makes it essential for the model to be re-trained regularly. The process of producing forecasts includes: (1) fetching and pre-processing a large set of data, (2) hyperparameter tuning (3) back testing, and (4) post-processing. This entire workflow can run for up to 15 hours. Because of its long runtime, it is important that the pipeline is robust, decoupled, and easy to maintain. This is where machine learning operations (MLOps) comes in.

First, it is important that the process is separated into logical, smaller steps. The individual steps are run by Amazon SageMaker, while Amazon SageMaker Pipelines is used to orchestrate them. SageMaker Pipelines is a purpose-built, easy-to-use continuous integration and continuous delivery (CI/CD) service for machine learning. Amazon SageMaker Pipelines has the advantage of direct integration with Amazon SageMaker and can be built using the open-source Amazon SageMaker Python SDK. We built a SageMaker Pipeline containing four main blocks, described in the architecture diagram below.

Figure 4 — Architecture Diagram of the MLOps Architecture

Pre-processing pipeline: Because we are dealing with large amounts of data (weekly historical sales data, as well as various article metadata), parallel processing with Apache Spark is used for data pre-processing. We used the managed SageMaker PySpark containers inside SageMaker Processing jobs to run the Spark code.

Model Training pipeline: The training pipeline starts with a step that runs a SageMaker Hyperparameter Tuning Job to identify the best hyperparameters, keeping the most recent observed weeks as validation set. To not waste any data and increase model performance, the training pipeline ends with running a final training job on the entire dataset, using the best hyperparameters selected in the previous step. The model artefacts of the final model are stored in SageMaker Model Registry.

Model Inference pipeline: In the inference pipeline, SageMaker is using the registered model artefacts from Model Registry to perform a SageMaker Batch Transform Job on a chosen inference set to create forecasts for the upcoming weeks. These are the forecasts used by our stakeholders for buying and demand planning purposes. But not before we adjust and format them in the post-processing pipeline.

Post-processing pipeline: The Post-processing pipeline includes additional logic to be run on the forecasts before sharing with end-users (e.g. rounding). Other post-processing steps include the classification of observed sales trends (does the forecast show an increasing, decreasing, or flat trend), as well as steps for measuring the model’s performance in the previous model run (model monitoring). Figure 5 below displays the graph of the post-processing pipeline in SageMaker Studio. You can see that some steps are run in parallel (e.g. inference on different datasets), while others are run consecutively. SageMaker understands which steps in the pipeline are independent and can be run at the same time, while other steps should be triggered only after certain steps are finished. It is also possible to explicitly tell SageMaker which steps are dependent on one another. This leads to higher efficiency and speed of the pipeline.

Figure 5–Graph of the post-processing SageMaker Pipeline

By leveraging the SageMaker Python SDK library, SageMaker Studio and SageMaker Pipelines for creating and managing ML workflows, our team of data scientists is able to define and maintain the required infrastructure, without needing complex infrastructure code or specialised engineering skills.

CI/CD, way-of-working, and separation of ownership

A mature team differentiates itself by having clear guidelines on their way-of-working. Way of working guidelines may include technical aspects such as coding standards and package versions as well as the team’s preferences on structuring the repository, a schedule for releasing to production, or testing standards. In this section, we call out a few way-of-working guidelines we set up during the project: the repository structure, experimentation set-up, and CI/CD implementation.

Repository structure

The main callout with respect to our repository structure is that (static) infrastructure code is kept separately from machine learning-related code. There are two separate repositories. One repository controls the base infrastructure of the overall architecture. This includes the definition and creation of Amazon S3 buckets, IAM roles, container images, as well as static AWS Lambda functions. This repository rarely changes, i.e. is treated as static infrastructure. The second repository is owned and maintained by data scientists. This repository contains, among other things, all SageMaker Pipelines definitions, configuration settings for running SageMaker Jobs as well as the scripts that are run inside the jobs. This setup ensures ownership and autonomy of the data scientists: they are able to manage and improve the forecasting solution without the need of a dedicated DevOps engineer.

Experimentation setup

A solid experimentation setup is important when improving your product, especially in the MVP phase when numerous experiments are run, sometimes even in parallel. We therefore invested in setting up a well-structured experimentation workflow. In our case this means that experimentation is done in an environment that is completely isolated from other experiments and from the production environment, but is replicating production end-to-end. The data scientist starts with creating a branch from the main branch in dev. This automatically triggers the creation of a new SageMaker Pipeline and S3 subfolders that is linked to the branch ID. This way, each experiment has its own isolated environment. Scripts, pipeline steps, configurations, input data can be changed in the experiment branch and tested as if it was running in production. Once the data scientist is done testing their “new feature”, they create a pull request to the main branch in dev. Every week, there will be one release from dev to prod. Both the main dev branch and the prod branch have separated AWS environments and their CI/CD is managed through Bitbucket and Jenkins. Jenkins updates the SageMaker pipeline definition in the respective environments and runs them on a weekly basis with the new changes. As a rule of thumb, the SageMaker pipeline should run end-to-end and produce improved forecasts before a release to the production environment takes place.

Integrating these MLOps concepts into the data science way-of-working has increased the autonomy, flexibility, and ownership of the data scientists. The separation of static infrastructure and the SageMaker Pipelines infrastructure makes the development and integration of new features into production easier. Moreover, the described experimentation setup drastically reduces the gap between experimentation and production. All of this leads to a much faster development pace and helps the team bring value to the business in a quicker way, increasing the business impact of the product. These foundational concepts are being adopted in other projects as well, such as pre-season demand forecasting and price elasticity.

Conclusion

The goal of this use-case was to build a forecasting engine that produces accurate article-level demand forecasts, with the purpose of increasing customer satisfaction and maximising sales through a streamlined buying and stock planning process. A team of data scientists and AWS experts was set up to work on this new use-case. This collaboration resulted in the creation of a fully automated process that produces forecasts for thousands of articles every single week. Combining a state-of-the-art deep learning algorithm with MLOps best-practices and AWS’ powerful suite of cloud infrastructure came together in a robust, scalable, and reliable forecasting process. Having this fully-automated process in place not only means that over-prediction errors were reduced by 40 percent points across the full article base, but also means a great efficiency gain in the way this process is run. The new, robust pipeline allows for fast, frequent and a reliable delivery of forecasts at a large scale, making it an integral part of eCom supply chain management.

About the authors

Jana van Leuven is a Senior Data Scientist at adidas
Oliver Franke is a Senior Product Manager at adidas
Viktor Malesevic is a Machine Learning Engineer at Amazon Web Services
Daniel Krumpholz is a Senior Engagement Manager at Amazon Web Services
Sidney Machill is a Data Scientist at Amazon Web Services
Eugene Teoh is a Data Scientist at Amazon Web Services

Other team members that contributed to this project:
João Pereira is a Senior Data Scientist at adidas
Amine Bellahsen is a Data Scientist at adidas

The views, thoughts, and opinions expressed in the text belong solely to the author, and do not represent the opinion, strategy or goals of the author’s employer, organization, committee or any other group or individual.

--

--