The Reader Who Went Empty-Handed: Survival Analysis For Sellout Estimation

🕵How to model survivorship when time is negligible

Oliver Schreiber
Axel Springer Tech
5 min readJun 14, 2021

--

Joint work by Justin Neumann (Data Scientist, Marktanalyse @ Sales Impact) and Oliver Schreiber (Data Scientist, Marktanalyse @ Sales Impact).

Photo by Roman Kraft on Unsplash

At Axel Springer, Europe’s largest publishing house, and Sales Impact, its print- and sales-related subsidiary, we are always looking for opportunities to further quantify our business and derive value from newly generated insights. This is particularly important for the physical newspaper market, where demand is declining year on year and steady action is required to maximize sales. Today, we want to convince you of the value of a rather innovative approach: Using Survival Analysis for the sellout estimation of Axel Springer’s print media in the German market. The question we try to answer is the following: How many newspapers of a particular type would have still been sold at outlets where the newspaper went out of stock? The trick regarding Survival Analysis is that instead of the time dimension (which is not available for a walk-in purchase), we take any other suitable variable such as sales or price with the corresponding events happening to create our statistical model.

Firstly, we will generalise our approach with an easy-to-grasp example of Titanic fare prices and explain the proposed approach in full detail. Eventually, we reflect on our media business application to some extent and stress out the approach’s importance for our market oversight. Let’s begin…

Fare well, time! Hello, unknown.

How much fare should you have paid in order to survive the infamous Sinking of the Titanic (1912)? There are hardly any timestamps available from back then. Still, we can produce the following plot:

We see a clear relationship between survival and fare spent. You may have realised that the survivorship increases the higher the fare (which makes sense in this case). Usually, survivorship would only decrease with the number of (negative) events happening, and so we had to transform the calculated baseline (literally the “death quota”) to receive this plot above. Also we only have used very limited data in this example, namely one observation per individual.

What is Survival Analysis?

According to Wikipedia, survival analysis is a branch of statistics for analyzing the expected duration of time until one or more events happen, such as death in biological organisms and failure in mechanical systems. For further explanation, please refer to these introductional texts of the Kaplan-Meier estimator and the herewith applied Cox Proportional Hazards model.

But there is more to survival analysis than just these survival baselines being calculated. Specifically, we can use survival analysis to predict the remaining time until the event will happen given an observed entity, which is called conditional survival. This brings us to interesting conclusions!

How high was the premium to survive Titanic?

In this theoretical disquisition, and before reflecting on our actual business problem, we continue our special interest in the Titanic. What if we could tell the additional fare for each individual in order to survive this disaster? It was actually, statistically speaking and according to our model about USD 44 per each non-surviving individual. Of course we ignore a lot of real-life restriction such as rescue capacity. Anyway, we can actually go that far and tell that first class passenger Charles Alexander Fortune (19 y.o.) who paid USD 263 and didn’t survive the catastrophe would have had to pay an additional USD 429 to survive the Titanic. The level of the premium is concluded by the gender, traveller class and other properties included in our model. Statistically speaking, a fare of about USD 85.2 gave a pretty high chance of survival of 95% already, as you can see from the function approximation below.

The resulting second function approximation, shown below, gives us statistical insights into the premium to pay for someone to survive the Titanic disaster. Please note: We’d require much more data for this second approximation to work out well, as a locally calculated survival quota is used in order to connect the fare with the premium to survive.

Application to the print business

Similar to our Titanic example, in our business application, there is also a variable together with events that allow a survival analysis to be carried out. Eventually, we also solve a nut very hard to crack with this approach. We have tens of thousands of newspaper retailers in Germany, some of whom are occasionally selling all the newspapers of one or the other media outlet, such as BILD or WELT. In this case we are not able to observe the real demand we would like to know for further calculations. Therefore, we need an estimator for every sold-out shop that predicts the number of copies this shop could have sold. By modeling sales against sellout events among all other selling days, we can infer these remaining sales (i.e. lost sales) for a given sold-out retailer. We then, with some more number crunching, can relate a specific sellout quota in the market to a number of lost sales in hard copies per retailer. This estimation can be very handy when quantifying the print market and managing the distribution of newspapers in the market. Forgive if we didn’t explain this application to our business in full depth and breadth.

Summary

In this article, we presented our very own solution of using Survival Analysis to estimate lost newspaper sales in copies given a specific sold-out quota in the German newspaper market. We also showed a minimalistic example of its application on the Titanic dataset to estimate the premium fare for survival.

We hope you liked this story and encourage you to give us a 👏 “high five” so that others can find this article, too.

--

--