Predicting Newspaper Sales with Amazon SageMaker DeepAR
A passion for print media
At Sales Impact, a 100% subsidiary of Axel Springer, we are all about sales of print media. We provide regional sales activities and wholesale communication for the supervision of retail sales and the logistics involved for delivery domestic and overseas. Also we do the planning and execution of sales marketing measures, customer acquisition within the scope of direct sales, coordination of the German âSunday marketâ and much more.
At my team market analytics, we evaluate, advise and control what happens in the German print media market in terms of sales, logistics and advertisement. This happens at an international, national, regional, wholesale and shop level for print media such as WELT and BILD. My work as a Data Scientist mainly gravitates around the prediction of the market and the calculation of key figures in the market.
Vast complexity, vast opportunities
Our shop-level sales data is among our most valuable assets. Without going too deep into detail, we know the sales of some 100,000 shops for Axel Springerâs print media with some delay. Making use of this data is hugely important to understand our print media sales. But sometimes a delay in shop-level sales data is unacceptable for instance when the editorial department of the BILD wants to know how well it performed last week in terms of sales. We can solve this and other related problems with predicting the sales for these 100,000 shops!
Your friend in the cloud
Using Amazon Web Services, we can leverage their machine learning solution Amazon SageMaker in order to make such a prediction. But then, how would you predict some 100,000 shops without losing the information that exists among these shops? Fortunately, there is an algorithm out there that takes into account just this: the Amazon SageMaker DeepAR forecasting algorithm. But this really can be translated to any problem that has at least several hundreds of concurrent time series like e.g. with many products.
The DeepAR forecasting algorithm is a supervised learning algorithm for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN) and it is astonishingly sound. You can hand this algo tens of thousands of time series possibly together with time-independent categories and additional time-dependent information for each time series, and it will train a model that then is able to predict the potential future of a time series (possibly together with its specific time-independent categories and additional time-dependent information).
Predicting Germanyâs major newspapersâ sales
So we adopted and automated this RNN-based algorithm as follows in Figure 1 and did see some major improvements of our prediction quality compared to the singular approaches such as ARIMA or exponential smoothing that were previously in place. The original paper suggests a general improvement of accuracy of around 15% for the prediction of related time series compared to state-of-the-art methods. If you need a starting point for the implementation of DeepAR using SageMaker, I recommend this notebook from Amazon.
It is amazing how much you can automate with a little help of the boto3 (the AWS SDK for Python) library. For your ease, you can find the complete boto3 workflow below (though the data pre- and post-processing part is missing). Please note that we used .json files as the input data type.
This is how a normal run can look like on my laptop:
Implications to our business
With the more accurate prediction of sales, we are able to give even more accurate projections to the editorial departments. Also, we work on taking into account these sales predictions to improve logistical key figures we provide to our business partners.
Summary
In this article, we write about predicting newspaper sales using Amazon SageMaker DeepAR. After a short company and team introduction, we give a shallow description of our shop-level sales data and the related problem. We then describe how DeepAR is a suited algorithm for this problem, followed by an overview of our solution together with some sample code to reproduce our solution. Finally, we claim that such a prediction with DeepAR is beneficial to our business.
About the author: Justin Neumann is a Data Scientist and MS in Predictive Analytics helping to transform companies into analytical competitors. He works at Sales Impact, a subsidiary of Axel Springer.