How We Forecasted the 2020 Black Friday at idealo

Luiz Davi
idealo Tech Blog
Published in
9 min readDec 17, 2020

Authors: Arjun Roy, Luiz Davi

For over a thousand years sharing your expectations for the coming future and stating the reasoning behind them has been a major trait of our species. We developed our own complex system of ideas and opinions to help us prepare, understand, and cope with challenges or surprises life brings to us non-stop. Trying to predict what the next days, months, and eras are going to look like is also encompassed by the judgment and point of view of the one emitting it.

Take, for example, weather forecasting. There are plenty of annotations and discoveries showing that Babylonians (630 BCE) were predicting the weather from cloud patterns as well as astrology. The Chinese and Indian astronomers were also known for similar weather-forecasting methods by around 300 BCE. The ability to predict and foresee the future has been part of our ancestors’ longing while they lived.

The Storm Prediction Center’s 20z tornado outlook on February 28, 2017

In case you are interested, documentaries like the Storm Troopers: The Fight to Forecast the Weather from BBC share enlightening information on how they fought for and dominated this ability with great effort, ideation and collaboration.

Coming to our idealo world, we all know how trends, seasons, and consumer behavior in e-commerce affect the way we forecast our main business KPIs like marketplace orders or leadouts. For the sake of bringing the prediction topic to our internal newsletter, we decided to write this article aiming to share knowledge but also for a bit of fun for you, firstly to our internal readers and now with all of you in the Medium community through the idealo Tech Blog.

“Errors using inadequate data are much less than those using no data at all.” — Charles Babbage

As Black Friday got near, we decided to explore a few possibilities of how idealo’s results would look like, even though we had limited data to predict with. So, we came up with a plan: Predict the following KPIs on the Black Friday peak. They are:

1. Orders

Defined by the count of purchases idealo received as directly from users by clicking on the “Zum Kauf über idealo” button.

2. Leadouts

Defined by the count of user clicks on offers from our partners.

3. Page Impressions

Defined by the count of impressions in all idealo pages.

In order to put together some interesting analysis, we got the historical data available on the Data Lake. We have also put together the dates from last Black Fridays so that we could use them to obtain the predictions. Here they are:

Black Friday dates

  • 25. November 2016
  • 24. November 2017
  • 23. November 2018
  • 29. November 2019
  • 27. November 2020

In the next paragraphs, you will be able to check each of those approaches and take a sneak peek of how we calculated them. Let’s start.

1. Forecasting Using Linear Series Approximation

The simplest approach to forecasting Black Friday this year was to use the so-called Linear Series approximation. One can easily calculate it by utilizing Excel’s built-in feature.

For this approach, we decided to use 3 different time intervals as inputs (2016–2019, 2017–2019, 2018–2019) and predict the next value (for 2020) using a linear estimation. By using this approach, we were able to predict, for example, the idealo Leadouts count as you can see in the figure below. It shows the predictions (orange dot) considering the three different time ranges.

Leadouts prediction (orange dot) chart using past 4 years, using past 3 years, using past 2 years

Similarly, we predicted the other two KPIs (Orders & Page impressions) using the corresponding Black Friday day data from the past 4 years, 3 years, and 2 years. Even though this is a very simple approach, we still consider it to be quite useful as the trend over the past few years has been relatively linear.

Now that we have this first prediction batch, we want to go explore some more different models and solutions to meet the challenge. Let’s dive into it!

2. Build-up From Weeks Before Black Friday

This approach is slightly more sophisticated than the previous one. Here the idea was to use the build-up of KPIs in weeks before Black Friday approaches. As can be seen from the chart of the daily number of orders, it starts to increase from mid-October and accelerates till Black Friday every year. The other two KPIs (leadouts & page impressions) also follow a very similar pattern.

Orders over time chart showing pre-Black Friday build-up curve and orders peak

To keep things simple, we broke down this pre-Black Friday period into 4 phases:

  • Phase 1: mid-Sep to mid-Oct — the “normal” phase when the KPIs are stable
  • Phase 2: mid-Oct to mid-Nov — the start of the growth period
  • Phase 3: mid-Nov to Black Friday — accelerating growth period
  • Phase 4: Black Friday day itself

Then we calculated the average values of all 3 KPIs for each phase. This helps to identify the growth rate which then can be used to predict the current year’s numbers. As an example, let us predict the number of orders on Black Friday. The formula looks like this:

Formula we used to predict the number of orders on the Black Friday

It looks complicated but actually, it’s rather simple. It has 3 components multiplied together. The first component is the previous year’s Black Friday orders itself. The second and third components are two different scaling factors that would scale-up (or down) the 2019 Black Friday orders. The second component is the proportion of orders from Phase 2 of 2020 to Phase 2 of 2019. This captures the increase in the number of orders this year in comparison to the previous year.

The third component is a ratio of two ratios. This is explained in the following charts. The first one shows a zoomed-in view of the weeks before Black Friday 2019. The dotted lines represent the average number of orders for the first 3 phases. As can be seen, the averages keep increasing over each phase till they reach Black Friday.

Orders Build-up chart including the first 3 phases

Similar to the chart above, the second chart below depicts the number of orders for phase 1 & 2 for 2020. We do not have phase 3 data yet, as of writing this article. So, Phase 1, Phase 2 & Black Friday data were used for the prediction.

Now, by dividing the average number of orders for Phase 2, 2020 by that of Phase 1, 2020, we get a general estimate of the growth rate in 2020. This is helpful, as we can use it to extrapolate this growth to Black Friday 2020. However, it should be noted that the growth in this year may be very different from the growth last year, especially considering the extraneous factors like the pandemic & lockdowns. Therefore, we compared the ratio of growth rates in 2020 to that of 2019. That means that if the growth this year is like that of last year, the final ratio should be around 1. This concludes the third component in a nutshell!

After simplifying the equation, we get:

Formula we used to calculate the Orders for 2020

Plugging in the average values, we get our prediction of orders for Black Friday 2020. The same approach was used to predict the other two KPIs as well.

3. Using fbprophet Python Library

The final approach we used was to try out a time series forecasting model, which would capture the trend & seasonality of each KPI over time.

fbrophet is an open-source time-series forecasting python library designed by Facebook for ease of use without any expert knowledge in statistics or time series forecasting. It builds an additive model by finding the best smooth line which can be represented as a sum of the following components:

y(t) = g(t) + s(t) + h(t) + ϵ,

where,

g(t): Overall growth trend

s(t): Yearly & weekly seasonality

h(t): Holidays effects

ϵ: Random error

Here is the original paper in case anyone is interested in further details: https://peerj.com/preprints/3190/

We used the German holiday list and added some special days like Black Friday when we expected the numbers to vary significantly. Apart from that, we did not tinker with the algorithm and used the default model & parameters. The reasons for this were A, keep the analysis as simple as possible and B, test out the effectiveness of fbprophet which brands itself as an automatic self-learning algorithm.

The graph below shows the actual daily Leadouts (in blue) for the past 2 years till mid-March 2020, which were used to train the model. The predicted values are shown in red based on the trained model. The reason we chose this training data was to avoid the confounding impacts of Covid & lockdowns this year on the predictions. We know what you’re thinking, “But what if Covid/lockdowns have a major impact on Black Friday??”. For exactly this reason we also performed a separate impact analysis for Covid which was then incorporated into the predictions (keep reading!).

As per this analysis, the Black Friday 2020 prediction is the peak of Leadouts as shown. A similar analysis was also performed for orders & page impressions.

Time Series Chart predicting Leadouts for this year

Covid-19 Corrections Multiplier

For all the analysis we have done so far, we did not incorporate the impact of Covid-19 or lockdowns in the predictions. The predictions were based on previous years’ data when conditions were far different from those we have this year. Therefore to us, it made sense in calculating a Covid-19 Correction Multiplier(CCM) which can then be multiplied to the original predictions to obtain an adjusted value. Of course, the assumption here is that the same effects also apply on Black Friday, which may or may not be correct.

As an example, upon inspecting the fbprophet prediction and the actual number of Leadouts this year, there was a BIG (positive — for our KPIs!) Covid-19 impact during March-June. Thus, we used this period to calculate the average increase in leadouts due to Covid/lockdowns. This was done by taking an average over the (actual/expected) leadouts during this period.

Predicted x actual Covid-19 impact

This proportion came to be 1.23, which means there were 1.23 times more Leadouts than what we expected. This can be used to obtain corrected predictions by simply multiplying them with the original predictions.

What’s even more interesting is that in the recent few months, the actual leadouts were less than expected. This implies a sort of market correction due to excessive purchasing earlier in the year. So, it was one of the interesting questions to see whether there was an actual impact of Covid-19 on the Black Friday results or not. Let’s check it out in the final section below.

Final Results

After all, we calculated our predictions for Black Friday 2020! Unfortunately, we can’t share any confidential numbers but we are happy to present the best performing prediction approaches.

The overall winners are:

Gold: Linear approx 2 years w/ CCM
(with avg accuracy of 93% across the 3 KPIs)

Silver: Pre-Black Friday Build-up w/out CCM
(with avg accuracy of 91% across the 3 KPIs)

Bronze: fbprophet w/ CCM
(with avg accuracy of 88% across the 3 KPIs)

The winners by KPI are:

Orders: Linear approx 2 years w/ CCM

Leadouts: Linear approx 3 years w/out CCM

Page impressions: Pre-Black Friday Build-up w/out CCM

Do you love creating complex machine learning solutions? Take a look at our open positions.

--

--

Luiz Davi
idealo Tech Blog

Product Management, Machine Learning Enthusiast, Engineer.