The Telegraph has been a pioneer of digital news world since introducing Europe’s first daily web-based newspaper, “Electric Telegraph” on 15 November 1994. Since then, print and digital editions went hand in hand in support of quality journalism. Recent years have seen sales numbers falling for many traditional print titles, due to the digital revolution and a shift in our news consumption habits. As a result, optimising the newspaper circulation to reflect this shift in news consumption seems inevitable.
The Problems & KPIs
To optimise circulation, you need to determine retailers’ optimal print supplies on a daily basis, so the demand for the newspaper is supported without being oversupplied. When it comes to supply and demand, the newspaper industry works differently from many other industries in the UK. The publishers are obliged to buy back the unsold newspapers from the retailers at the end of each day. Hence, oversupplying not only damages our environment but also has a negative impact on the publishing industry, financially, as there are costs associated with the print and distribution of the newspaper.
The following main KPIs come to mind initially to tackle the issue of returns for the daily and Sunday Telegraph editions:
- Maximise sales
- Minimise unsold
However, to unify the KPIs and make it a pursuable objective for the machine learning agent, we had to come up with the idea of a single KPI. The unifying KPI that involves both of the other KPIs is maximising profit. This unified KPI is aimed at maintaining a balance between the daily sales and returns considering the revenue and costs of a certain product.
Also, due to the irregularity in the availability of the most recent sales data, forecasting one day ahead was not feasible. The retailers’ sales records could arrive up to 10 days later than the actual sale depending on the type of retailer. Hence a multi-step-ahead approach with a variable step size was needed. Also, as our weekend editions include magazines (which require more time to print), an advance supply notice is required. From the forecasting point of view, any of the mentioned points adds to the uncertainty of the forecasts.
Finding the Ultimate Solution
As a cross-functional team of Data Scientists, Data Engineers and Technologists, we were determined to benefit from the most advanced technologies to support this business decision. We decided to use machine learning (specifically deep learning) to build a “digital” agent to optimise our “physical” daily newspaper supply at the retailer level. This agent could benefit from the vast amount of unique data available to it and gain efficient-allocation knowledge. This could be derived from uncovering complex patterns in our historical newspaper sales, the connection between sales figures, retailer characteristics and our news calendar, etc. To do this, we started simple and gradually added new features and tried more complex methods.
We began with traditional time-series decomposition and forecasting techniques. We tried to build a model per retailer using the historical sales and to forecast multi-step ahead. The models in this approach were temporary models per retailer which were becoming discarded after each forecast. As we needed efficient and fast forecasting methods to scale up this approach, we used an additive regression time-series forecasting technique.
The temporary models built by this technique were predicting sales, however, to make sure we didn’t lose any sales, we decided to utilise the upper bands of the confidence interval of our forecasts to safeguard the sales. Hence, our ultimate supplies were also influenced by the uncertainty in retailers’ historical sales. This enabled us to deliver a proof-of-concept that was reducing our unsold while maintaining sales and hence increasing our profit indirectly. The proof-of-concept was then automated and live tests started with a small proportion of our retailers.
After the success of the proof-of-concept, we looked into using machine learning (ML) — specifically Artificial Neural Networks (ANNs) — to build scalable models that could predict sales more accurately. Since the training stage of machine learning techniques (especially ANNs) could be processing-intensive, time-consuming and costly, we were aiming to build a model that was robust and did not require very frequent retraining. The ML approach enabled us to benefit from diverse data sources and newly-introduced features whether numeric or categorical, time-variant or time-invariant.
Although predicting sales more accurately than the traditional rule-based allocation systems can result in higher profit indirectly, this is not the most optimal solution. To maximise the business profit we tried to optimise directly on reducing all costs and let the system decide the “profitability threshold” and the “profitable supply” per retailer. This threshold is variable from retailer to retailer and is determined based on their sales, edition of the newspaper, day of the week, retailer characteristics and certain events.
Similar to any other optimisation problem, the initial step was to convert our goal, i.e profit-maximisation, to a mathematical form. We defined a cost (loss) function for our ML agent that calculates the overall cost of supplying papers to our retailers. By resulting in unsold papers, the agent would be penalised by the cost of printing and transport which is variable per product/day. In this way we could unify our model for all our products across the week, which have different print costs or sales revenue. This enabled the agent to only pursue sales that are profitable, which is a shift from the traditional sales at any cost and availability-based approach.
Finally, this approach predicts the profitable supply that would result in a profitable number of sales directly, and without requiring adding any safety layer manually.
As a result our ongoing live test continued to expand across the country since.
Artificial Neural Network Architecture
To achieve the goal of our unified modelling, we designed our custom ANN architecture in-house. The unified model architecture is a stack of layers and networks. This includes multiple embedding layers, regularisation layers, and a deep feedforward network. The purpose of the embedding layers is to ingest features such as the calendar of events and holidays, product types, etc. The feedforward network has four hidden layers and an input layer, which is the concatenation of embeddings and the historical sales inputs.
The historical sales input consists of a fixed-sized window of sales which slides one step at each time by a stride of size one per retailer. The network has one output node with identity activation that determines the number of copies to be sent to a retailer.
A generalised version of this architecture is illustrated in the following figure:
The preprocessing stage involves the standardisation of numeric inputs and encoding categorical inputs. The time series are also converted into multiple arrays of sliding windows.
As mentioned before, the cost function for our optimisation problem is the sum of the profit per sale for lost sales and cost of printing for the unsold, which are variable per product and day of the week. We call this cost function the profit maximisation function. Please note that this network in its current form can only predict one step ahead, i.e. one day ahead forecasting.
Multi-Step Ahead Strategy
The unified architecture has one output node and can only predict one day ahead. To enable it to forecast the multi-steps ahead, we decided to use a recursive strategy. In this strategy, one step ahead is initially predicted. This prediction is then used as the latest available data point in our sliding window of recent sales. Then we predict another step ahead using the updated sliding window values.
This process continues until we reach the desired day we would like to supply in the future. The following figure illustrates this strategy.
This approach has proven to be the most feasible as it allows us to predict multiple steps ahead flexibly. However, our model is optimised to predict a profitable supply rather than the exact sales, and the supply is normally higher than the actual sale. Using the predicted supplies in the sliding window as predictors can result in a continuous increase in our subsequent steps ahead predictions as the model considers that as an increase in demand.
To resolve this, we trained a second agent using the same architecture to predict sales rather than supply. We use the sales prediction model for step ahead prediction until we reach the desired step. However, for the desired (final) step we use our profitable supply predictor agent. The profitable supply is then allocated to the retailers. The following figure illustrates this strategy using both agents:
Overrides & Alterations
Although the model-recommended supplies are the most profitable supplies for us, some retailers have certain requirements that need to be met. We use a Google Sheet to receive the retailer requirements through our distributors. We call these requirements retailer alterations. These alterations will override the system’s recommended supply. Additionally, manual alterations in the form of supply increases and boxouts were made possible for unforeseen news events. These are also captured through a Google Sheet and applied on top of the system’s recommended supplies.
Cloud Infrastructure Architecture
The project infrastructure is based on Google Cloud Platform (GCP) and several modules of the GCP are used to train, test, schedule and troubleshoot our ML forecasting tool. Cloud Composer, Kubernetes Engine, AI Platform, Storage and Sheets are among the mainly used modules. The following points summarise the end-to-end process on GCP.
- Distributors upload the retailers’ daily supply and return data into the FTP server.
- Dockerised ETL pipeline hosted on Google Cloud Kubernetes Engine (GKE) gets the data from the FTP server and pushes it into MySQL instance. Composer DAG is used to run the pipeline on a GKE pod on an hourly basis.
- MySQL data is used for reporting on business objects.
- Dockerised ETL pipeline written in Python and hosted in GKE get data from MySQL and pushes it into BigQuery. Composer DAG is used to run the pipeline on a GKE pod every morning.
- The Forecaster retrieves the predictors’ data for the desired retailers from BigQuery.
- A forecast request along side with the predictors vector is sent to the ML model hosted on AI Platform. This vector includes the predictors for multiple retailers.
- Forecaster receives the forecasts for all the requested retailers.
- Forecaster pushes the ML forecasts and any manual alterations into BigQuery. Composer DAG is used to run the forecaster on a GKE pod.
- Configurations such as the forecast regions, schedule of run dates, etc are stored in an external table which is connected to and controlled by a Google Sheet.
- The retailer alterations/constraints are then applied on top of the generated forecasts.
- The final allocations are then stored in BigQuery for the last time.
- When all forecasts are in BigQuery, the orchestrator runs the dockerised packager and prepares them to send to the distributors by email.