Understanding the patterns of a demand question (part 2)

Applying the model I did not want to apply

19 min readDec 22, 2022

So, in a previous blog post I already showcased the data coming from a service center highlighting the number of telephone calls and e-mails received in a year. I also showed that this question is an intermittent demand question. Therefore, I will not rehash the data or the graphics I already produced but move quickly to the model I never wanted to apply but in the end ended up applying anyhow.

The data consists of traditional time-series data, but then in quarter hours. Look at all those zero’s. They made my life really really difficult.

df_tbl<-df%>%
  dplyr::mutate(newdate = with(., ymd(Date) + hms(QuarterHour)))%>%
  #dplyr::mutate(newdate = with(., ymd(Date)))%>%
  group_by(RecordType, newdate)%>%
  filter(!RecordType %in% c("AgentState", "Callback call"))%>%
  filter(CallDirection=="Incoming" & 
           DoneInIVR==0 & AgentSequence==1)%>%
  count()%>%
  group_by(RecordType) %>%
  imputeTS::na_kalman(.) %>% 
  select(RecordType, newdate, n)%>%
  as_tsibble(key = RecordType, index = newdate) %>%
  fill_gaps(n = 0, .full = end())
> df_tbl
# A tsibble: 101,999 x 3 [15m] <UTC>
# Key:       RecordType [3]
# Groups:    RecordType [3]
   RecordType newdate                 n
   <chr>      <dttm>              <dbl>
 1 Chat call  2022-11-28 09:30:00     8
 2 Chat call  2022-11-28 09:45:00     1
 3 Chat call  2022-11-28 10:00:00     1
 4 Chat call  2022-11-28 10:15:00     0
 5 Chat call  2022-11-28 10:30:00     0
 6 Chat call  2022-11-28 10:45:00     0
 7 Chat call  2022-11-28 11:00:00     0
 8 Chat call  2022-11-28 11:15:00     0
 9 Chat call  2022-11-28 11:30:00     0
10 Chat call  2022-11-28 11:45:00     0
# ... with 101,989 more rows

And here you see the model I never wanted to apply, but in the end did apply: the Prophet model. Much has been written about this model and so I will not rewrite what is already available. The reason for me not wanting to use it is the same reason for me not wanting to use XGboost — everybody is doing that. But then I moved deeper into the Prophet model and discovered how much tweaking I can do. There really is a LOT of tweaking possible, AND the model is a decomposition model by nature. And I am able to built in fourier terms. Which in the end provides me with the built-up necessary to deal with the cycles, events, and intermittent nature of the data.

Like most models, you can deploy them fairly simply by letting them auto-find the best possible solution.

fit_prophet <- df_tbl%>%
  filter(RecordType =="VOIP call")%>%
  select(newdate, n)%>%
  rename(ds=newdate,
         y=n)%>%
  prophet::prophet()
> str(fit_prophet)
List of 32
 $ growth                 : chr "linear"
 $ changepoints           : POSIXct[1:25], format: "2021-07-24 09:45:00" "2021-08-10 04:15:00" "2021-08-26 22:45:00" "2021-09-12 17:15:00" ...
 $ n.changepoints         : num 25
 $ changepoint.range      : num 0.8
 $ yearly.seasonality     : chr "auto"
 $ weekly.seasonality     : chr "auto"
 $ daily.seasonality      : chr "auto"
 $ holidays               : NULL
 $ seasonality.mode       : chr "additive"
 $ seasonality.prior.scale: num 10
 $ changepoint.prior.scale: num 0.05
 $ holidays.prior.scale   : num 10
 $ mcmc.samples           : num 0
 $ interval.width         : num 0.8
 $ uncertainty.samples    : num 1000
 $ specified.changepoints : logi FALSE
 $ start                  : POSIXct[1:1], format: "2021-07-07 15:15:00"
 $ y.scale                : num 32
 $ logistic.floor         : logi FALSE
 $ t.scale                : num 45279900
 $ changepoints.t         : num [1:25] 0.032 0.064 0.096 0.128 0.16 ...
 $ seasonalities          :List of 2

future <- prophet::make_future_dataframe(fit_prophet, periods = 672)
> tail(future)
                       ds
50979 2024-10-10 17:00:00
50980 2024-10-11 17:00:00
50981 2024-10-12 17:00:00
50982 2024-10-13 17:00:00
50983 2024-10-14 17:00:00
50984 2024-10-15 17:00:00

forecast <- predict(fit_prophet, future)
> forecast
                    ds    trend additive_terms additive_terms_lower additive_terms_upper       daily daily_lower
1  2021-07-07 15:15:00 1.333013      4.7732225            4.7732225            4.7732225  3.68798132  3.68798132
2  2021-07-07 15:30:00 1.334092      4.4641217            4.4641217            4.4641217  3.37616930  3.37616930
3  2021-07-07 15:45:00 1.335171      4.0699476            4.0699476            4.0699476  2.97948387  2.97948387
4  2021-07-07 16:00:00 1.336250      3.5940486            3.5940486            3.5940486  2.50127591  2.50127591
5  2021-07-07 16:15:00 1.337329      3.0452312            3.0452312            3.0452312  1.95035390  1.95035390
6  2021-07-07 16:30:00 1.338408      2.4373052            2.4373052            2.4373052  1.34052959  1.34052959
7  2021-07-07 16:45:00 1.339487      1.7882858            1.7882858            1.7882858  0.68981959  0.68981959
8  2021-07-07 17:00:00 1.340566      1.1193054            1.1193054            1.1193054  0.01935782  0.01935782
9  2021-07-07 17:15:00 1.341645      0.4533124            0.4533124            0.4533124 -0.64790649 -0.64790649
10 2021-07-07 17:30:00 1.342724     -0.1863574           -0.1863574           -0.1863574 -1.28863655 -1.28863655
11 2021-07-07 17:45:00 1.343803     -0.7774364           -0.7774364           -0.7774364 -1.88056435 -1.88056435
12 2021-07-07 18:00:00 1.344882     -1.3001123           -1.3001123           -1.3001123 -2.40387714 -2.40387714
13 2021-07-07 18:15:00 1.345961     -1.7382506           -1.7382506           -1.7382506 -2.84244059 -2.84244059
14 2021-07-07 18:30:00 1.347040     -2.0803734           -2.0803734           -2.0803734 -3.18477684 -3.18477684
15 2021-07-07 18:45:00 1.348119     -2.3203270           -2.3203270           -2.3203270 -3.42473282 -3.42473282
16 2021-07-07 19:00:00 1.349198     -2.4575977           -2.4575977           -2.4575977 -3.56179544 -3.56179544
17 2021-07-07 19:15:00 1.350277     -2.4972545           -2.4972545           -2.4972545 -3.60103475 -3.60103475
18 2021-07-07 19:30:00 1.351356     -2.4495272           -2.4495272           -2.4495272 -3.55268160 -3.55268160
19 2021-07-07 19:45:00 1.352436     -2.3290493           -2.3290493           -2.3290493 -3.43137120 -3.43137120
20 2021-07-07 20:00:00 1.353515     -2.1538225           -2.1538225           -2.1538225 -3.25510674 -3.25510674
21 2021-07-07 20:15:00 1.354594     -1.9439726           -1.9439726           -1.9439726 -3.04401598 -3.04401598
22 2021-07-07 20:30:00 1.355673     -1.7203859           -1.7203859           -1.7203859 -2.81898742 -2.81898742
23 2021-07-07 20:45:00 1.356752     -1.5033194           -1.5033194           -1.5033194 -2.60028047 -2.60028047
24 2021-07-07 21:00:00 1.357831     -1.3110805           -1.3110805           -1.3110805 -2.40620508 -2.40620508
25 2021-07-07 21:15:00 1.358910     -1.1588659           -1.1588659           -1.1588659 -2.25196080 -2.25196080
26 2021-07-07 21:30:00 1.359989     -1.0578387           -1.0578387           -1.0578387 -2.14871382 -2.14871382
27 2021-07-07 21:45:00 1.361068     -1.0145046           -1.0145046           -1.0145046 -2.10297311 -2.10297311
28 2021-07-07 22:00:00 1.362147     -1.0304275           -1.0304275           -1.0304275 -2.11630588 -2.11630588
29 2021-07-07 22:15:00 1.363226     -1.1023000           -1.1023000           -1.1023000 -2.18540855 -2.18540855
30 2021-07-07 22:30:00 1.364305     -1.2223612           -1.2223612           -1.2223612 -2.30252414 -2.30252414
31 2021-07-07 22:45:00 1.365384     -1.3791279           -1.3791279           -1.3791279 -2.45617324 -2.45617324
32 2021-07-07 23:00:00 1.366463     -1.5583830           -1.5583830           -1.5583830 -2.63214321 -2.63214321
33 2021-07-07 23:15:00 1.367542     -1.7443506           -1.7443506           -1.7443506 -2.81466253 -2.81466253
34 2021-07-07 23:30:00 1.368621     -1.9209694           -1.9209694           -1.9209694 -2.98767445 -2.98767445
35 2021-07-07 23:45:00 1.369700     -2.0731725           -2.0731725           -2.0731725 -3.13611688 -3.13611688
36 2021-07-08 00:00:00 1.370779     -2.1880808           -2.1880808           -2.1880808 -3.24711549 -3.24711549
37 2021-07-08 00:15:00 1.371858     -2.2560218           -2.2560218           -2.2560218 -3.31100301 -3.31100301
38 2021-07-08 00:30:00 1.372937     -2.2713012           -2.2713012           -2.2713012 -3.32209028 -3.32209028
39 2021-07-08 00:45:00 1.374016     -2.2326683           -2.2326683           -2.2326683 -3.27913189 -3.27913189
40 2021-07-08 01:00:00 1.375095     -2.1434409           -2.1434409           -2.1434409 -3.18545134 -3.18545134
41 2021-07-08 01:15:00 1.376174     -2.0112795           -2.0112795           -2.0112795 -3.04871452 -3.04871452
42 2021-07-08 01:30:00 1.377253     -1.8476225           -1.8476225           -1.8476225 -2.88036564 -2.88036564
43 2021-07-08 01:45:00 1.378332     -1.6668231           -1.6668231           -1.6668231 -2.69476377 -2.69476377
44 2021-07-08 02:00:00 1.379411     -1.4850463           -1.4850463           -1.4850463 -2.50807993 -2.50807993
45 2021-07-08 02:15:00 1.380490     -1.3190045           -1.3190045           -1.3190045 -2.33703236 -2.33703236
46 2021-07-08 02:30:00 1.381570     -1.1846198           -1.1846198           -1.1846198 -2.19754947 -2.19754947
47 2021-07-08 02:45:00 1.382649     -1.0957113           -1.0957113           -1.0957113 -2.10345652 -2.10345652
48 2021-07-08 03:00:00 1.383728     -1.0628003           -1.0628003           -1.0628003 -2.06528115 -2.06528115
49 2021-07-08 03:15:00 1.384807     -1.0921230           -1.0921230           -1.0921230 -2.08926583 -2.08926583
50 2021-07-08 03:30:00 1.385886     -1.1849243           -1.1849243           -1.1849243 -2.17666186 -2.17666186
51 2021-07-08 03:45:00 1.386965     -1.3370892           -1.3370892           -1.3370892 -2.32336081 -2.32336081
52 2021-07-08 04:00:00 1.388044     -1.5391453           -1.5391453           -1.5391453 -2.51989662 -2.51989662

> str(forecast)
'data.frame': 50984 obs. of  19 variables:
 $ ds                        : POSIXct, format: "2021-07-07 15:15:00" "2021-07-07 15:30:00" "2021-07-07 15:45:00" "2021-07-07 16:00:00" ...
 $ trend                     : num  1.33 1.33 1.34 1.34 1.34 ...
 $ additive_terms            : num  4.77 4.46 4.07 3.59 3.05 ...
 $ additive_terms_lower      : num  4.77 4.46 4.07 3.59 3.05 ...
 $ additive_terms_upper      : num  4.77 4.46 4.07 3.59 3.05 ...
 $ daily                     : num  3.69 3.38 2.98 2.5 1.95 ...
 $ daily_lower               : num  3.69 3.38 2.98 2.5 1.95 ...
 $ daily_upper               : num  3.69 3.38 2.98 2.5 1.95 ...
 $ weekly                    : num  1.09 1.09 1.09 1.09 1.09 ...
 $ weekly_lower              : num  1.09 1.09 1.09 1.09 1.09 ...
 $ weekly_upper              : num  1.09 1.09 1.09 1.09 1.09 ...
 $ multiplicative_terms      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ multiplicative_terms_lower: num  0 0 0 0 0 0 0 0 0 0 ...
 $ multiplicative_terms_upper: num  0 0 0 0 0 0 0 0 0 0 ...
 $ yhat_lower                : num  1.9493 1.7188 1.4521 0.7926 0.0558 ...
 $ yhat_upper                : num  10.14 10.24 9.56 8.76 8.35 ...
 $ trend_lower               : num  1.33 1.33 1.34 1.34 1.34 ...
 $ trend_upper               : num  1.33 1.33 1.34 1.34 1.34 ...
 $ yhat                      : num  6.11 5.8 5.41 4.93 4.38 ...

plot(fit_prophet, forecast)
prophet::prophet_plot_components(fit_prophet, forecast)

Easy to make, but this is NOT a good model. In fact, this is a horrible fit and forecast. What a rocky start.

In this plot you can see the components applied in the model. Like I said, the Prophet model is a decomposition model, and here I used the most basic of components. Much more tweaking is needed for me to tackle this model.

I know that events, like holidays, have an effect on demand. If the service center is closed on one day, you can expect more calls the next day. Lets add the Dutch national holidays.

# Adding holidays 
holiday<-c("Nieuwjaarsdag", "Goedevrijdag", "Paasdag1", "Paasdag2",
           "Koningsdag","Bevrijdsdag","Hemelvaartsdag","Pinksteren1", "Pinkesteren2",
           "Kerstmis1", "Kerstmis2", "Oudjaarsdag")
ds<-c(as.Date('2022-01-01'),
      as.Date('2022-04-15'),
      as.Date('2022-04-17'),
      as.Date('2022-04-18'),
      as.Date('2022-04-27'),
      as.Date('2022-05-05'),
      as.Date('2022-05-26'),
      as.Date('2022-06-05'),
      as.Date('2022-06-06'),
      as.Date('2022-12-25'),
      as.Date('2022-12-26'),
      as.Date('2022-12-31'))
lower_window<-rep(0,12)
upper_window<-rep(1,12)
holidays <- dplyr::bind_cols(holiday, ds, lower_window, upper_window)
colnames(holidays)<-c("holiday", "ds", "lower_window", "upper_window")
> holidays
# A tibble: 12 x 4
   holiday        ds         lower_window upper_window
   <chr>          <date>            <dbl>        <dbl>
 1 Nieuwjaarsdag  2022-01-01            0            1
 2 Goedevrijdag   2022-04-15            0            1
 3 Paasdag1       2022-04-17            0            1
 4 Paasdag2       2022-04-18            0            1
 5 Koningsdag     2022-04-27            0            1
 6 Bevrijdsdag    2022-05-05            0            1
 7 Hemelvaartsdag 2022-05-26            0            1
 8 Pinksteren1    2022-06-05            0            1
 9 Pinkesteren2   2022-06-06            0            1
10 Kerstmis1      2022-12-25            0            1
11 Kerstmis2      2022-12-26            0            1
12 Oudjaarsdag    2022-12-31            0            1

fit_prophet <- df_tbl%>%
  filter(RecordType =="VOIP call")%>%
  select(newdate, n)%>%
  rename(ds=newdate,
         y=n)%>%
  prophet::prophet(., holidays=holidays)
future <- prophet::make_future_dataframe(fit_prophet, 
                                         periods = 672, 
                                         freq = 900)
forecast <- predict(fit_prophet, future)
plot(fit_prophet, forecast)
prophet::prophet_plot_components(fit_prophet, forecast)

The holidays are now a component in the decomposed model.

I want to see what kind of forecasts I am actually getting, per hour and per day. Do they even make sense?

forecast_week<-forecast%>%
  filter(ds>max(df_tbl$newdate))%>%
  select(ds, yhat)%>%
  mutate(yhat=round(yhat))
forecast_week%>%
  mutate(yhat = if_else(yhat<0, 0, yhat), 
         day = factor(weekdays(ds), 
                      level=c("Monday", "Tuesday", "Wednesday",
                              "Thursday", "Friday", "Saturday", "Sunday")))%>%
  arrange(day, ds)%>%
  ggplot(., 
         aes(x=ds, y=yhat))+
  geom_bar(stat="identity")+
  theme_bw()+
  facet_wrap(~day, 
             scales="free", 
             ncol=2)+
  labs(x="Date & Time", 
       y="Incoming calls per quarterhour")

Well, it does make sense. I am not predicting any calls in the evening or at night.. The pyramid form also seems to be okay to me.

Lets summarize the forecasts into hourly data.

forecast_week%>%
  timetk::summarise_by_time(
    ds, .by="hour",
    n=sum(yhat))%>%
  mutate(n = if_else(n<0, 0, n), 
         day = factor(weekdays(ds), 
                      level=c("Monday", "Tuesday", "Wednesday",
                              "Thursday", "Friday", "Saturday", "Sunday")))%>%
  arrange(day, ds)%>%
  ggplot(., 
       aes(x=ds, y=n))+
  geom_bar(stat="identity")+
  theme_bw()+
  facet_wrap(~day, 
             scales="free_x", 
             ncol=2)+
  labs(x="Date & Time", 
       y="Incoming calls per hour")

Looking okay to me again — Monday has the highest amount of calls which makes sense after the weekend. We see the pyramid form again and each day we have periods with no expected calls.

How does it look if I summarize per day so I have a full week?

forecast_week%>%
  timetk::summarise_by_time(
    ds, .by="day",
    n=sum(yhat))%>%
  mutate(n = if_else(n<0, 0, n), 
         day = factor(weekdays(ds), 
                      level=c("Monday", "Tuesday", "Wednesday",
                              "Thursday", "Friday", "Saturday", "Sunday")))%>%
  arrange(day, ds)
# A tibble: 8 x 3
  ds                      n day      
  <dttm>              <dbl> <fct>    
1 2022-12-19 00:00:00   403 Monday   
2 2022-12-13 00:00:00    17 Tuesday  
3 2022-12-20 00:00:00   337 Tuesday  
4 2022-12-14 00:00:00   344 Wednesday
5 2022-12-15 00:00:00   334 Thursday 
6 2022-12-16 00:00:00   287 Friday   
7 2022-12-17 00:00:00     4 Saturday 
8 2022-12-18 00:00:00    11 Sunday

Looks good, except Tuesday the 13th. Not sure what happened there, but it could be that its not a full time range. That would explain the strange graph for Tuesday above. But all in all, it looks good to me overall. Once again, the peak for the week is on Monday and then we have a decline until Friday and almost no calls predicted for the weekend.

What would the prediction look like if I applied an ARIMA model? Just for fun sake, and also because I need to compare the Prophet model with another model. The auto.arima is a straightforward powerful modelling function. I will throw in the seasonal naive model to boot.

df_tbl%>%
  filter(RecordType =="VOIP call")%>%
  auto.arima %>% 
  forecast(h=672) %>% autoplot+theme_bw() 
df_tbl%>%
  filter(RecordType =="VOIP call")%>%
  auto.arima %>% 
  checkresiduals(test=FALSE)+theme_bw()

arima_model<-df_tbl%>%
  filter(RecordType =="VOIP call")%>%
  auto.arima %>% 
  forecast(h=672)

> arima_model
         Point Forecast       Lo 80     Hi 80       Lo 95     Hi 95
2545.597              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.608              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.618              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.629              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.639              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.649              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.660              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.670              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.681              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.691              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.702              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.712              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.722              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.733              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.743              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.754              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.764              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.774              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.785              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.795              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.806              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.816              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.827              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.837              0  -5.9371424  5.937142  -9.0800758  9.080076
2545.847              0  -5.9371424  5.937142  -9.0800758  9.080076


fit_snaive <- df_tbl%>%
  filter(RecordType =="VOIP call")%>%
  snaive(.,h=672) 
plot(fit_snaive)
fit_snaive<-fit_snaive%>%
  as.data.frame()%>%
  slice_tail(., n=672)

They look okay, these models, but in fact are not. They are not picking up the pyramid form the Prophet model did pick up.

The residuals from the ARIMA show a period and quite some long tails — which is equivalent to the peaks and valleys of the underlying data.

So, what if we compare the three models? Which one looks like it is best at targeting the data?

forecast%>%
  filter(ds>max(df_tbl$newdate))%>%
  select(ds, yhat)%>%
  mutate(n_prophet = round(yhat))%>%
  mutate(n_arima   = as.numeric(round(arima_model$mean)))%>%
  mutate(n_snaive  = as.numeric(round(fit_snaive$`Point Forecast`)))%>%
  select(-yhat)%>%
  mutate(n_prophet = if_else(n_prophet<0,0, n_prophet),
         n_arima   = if_else(n_arima<0,  0, n_arima),
         n_snaive  = if_else(n_snaive<0, 0, n_snaive),
         day = factor(weekdays(ds), 
                     level=c("Monday", "Tuesday", "Wednesday",
                             "Thursday", "Friday", "Saturday", "Sunday")))%>%
  arrange(day, ds)%>%
  ggplot(.)+
  geom_line(aes(x=ds, y=n_prophet, color="Prophet Model"))+
  geom_line(aes(x=ds, y=n_arima, color="ARIMA Model"))+
  geom_line(aes(x=ds, y=n_snaive, color="Seasonal NAIVE Model"))+
  theme_bw()+
  facet_wrap(~day, 
             scales="free", 
             ncol=2)+
  labs(x="Date & Time", 
       y="Incoming calls per quarterhour")

The ARIMA seems lost, or it is completely in agreement with the seasonal naïve model. I am not really sure at this point. Also, I find the seasonal naïve model to be very unwieldy, going up and down. The Prophet model seems more stable in terms of steps.

Of course, the above plot is not really helpful in deciding which model to use, as you would like to have the underlying data in the plot as well. What if we zoom in?


forecast%>%
  filter(ds>max(df_tbl$newdate))%>%
  select(ds, yhat)%>%
  mutate(n_prophet=round(yhat))%>%
  mutate(n_arima = as.numeric(round(arima_model$mean)))%>%
  select(-yhat)%>%
  timetk::summarise_by_time(
    ds, .by="hour",
    n_prophet=sum(n_prophet),
    n_arima=sum(n_arima))%>%
  mutate(hour=hour(ds))%>%
  filter(hour>=8 & hour<=17)%>%
  mutate(n_prophet = if_else(n_prophet<0, 0, n_prophet),
         n_arima = if_else(n_arima<0, 0, n_arima),
         day = factor(weekdays(ds), 
                      level=c("Monday", "Tuesday", "Wednesday",
                              "Thursday", "Friday", "Saturday", "Sunday")))%>%
  arrange(day, ds)%>%
  ggplot(.)+
  geom_line(aes(x=ds, y=n_prophet, color="Prophet Model"))+
  geom_line(aes(x=ds, y=n_arima, color="ARIMA Model"))+
  theme_bw()+
  facet_wrap(~day, 
             scales="free", 
             ncol=2)+
  labs(x="Date & Time", 
       y="Incoming calls per hour")

At the very least, the ARIMA model did provide forecasts. Once again, you cannot really see which model is best using plots like these, but you can see how they stack up. The problem with demand forecasting is that the room for error is very limited. And here the ARIMA model is continuously predicting higher than the Prophet model.

This becomes clear if you summarize the data per day for an entire week. In fact, what happens is that the ARIMA model is giving the exact same demand forecast for each day.

forecast%>%
  filter(ds>max(df_tbl$newdate))%>%
  select(ds, yhat)%>%
  mutate(n_prophet=round(yhat))%>%
  mutate(n_arima = as.numeric(round(arima_model$mean)))%>%
  select(-yhat)%>%
  timetk::summarise_by_time(
    ds, .by="hour",
    n_prophet=sum(n_prophet),
    n_arima=sum(n_arima))%>%
  mutate(hour=hour(ds))%>%
  filter(hour>=8 & hour<=17)%>%
  mutate(n_prophet = if_else(n_prophet<0, 0, n_prophet),
         n_arima = if_else(n_arima<0, 0, n_arima),
         day = factor(weekdays(ds), 
                      level=c("Monday", "Tuesday", "Wednesday",
                              "Thursday", "Friday", "Saturday", "Sunday")))%>%
  arrange(day, ds)%>%
  timetk::summarise_by_time(
    ds, .by="day",
    n_prophet=sum(n_prophet),
    n_arima=sum(n_arima))

# A tibble: 8 x 3
  ds                  n_prophet n_arima
  <dttm>                  <dbl>   <dbl>
1 2022-12-13 00:00:00         6       0
2 2022-12-14 00:00:00       298     389
3 2022-12-15 00:00:00       290     389
4 2022-12-16 00:00:00       275     389
5 2022-12-17 00:00:00       150     389
6 2022-12-18 00:00:00       155     389
7 2022-12-19 00:00:00       335     389
8 2022-12-20 00:00:00       288     389

From this moment on I focused on the Prophet model, but NOT because the ARIMA did not perform well which could very well because of how I treated it. Because I could have easily tweaked the ARIMA model, even including the Fourier terms I will enter into the Prophet model.

The real underlying reason for going with the Prophet model is because I can also add boundaries, which allows me to deal with the intermittent part of the data by adding events and fully decomposing it. Let me show you how.

TC_df<-df%>%
  #dplyr::mutate(newdate = with(., ymd(Date) + hms(QuarterHour)),
                #RecordTpe = factor(RecordType))%>%
  dplyr::mutate(newdate = with(., ymd(Date) + hours(Hour)),
                RecordTpe = factor(RecordType))%>%
  group_by(RecordType, newdate)%>%
  filter(!RecordType %in% c("AgentState", "Callback call"))%>%
  filter(CallDirection=="Incoming" & 
           DoneInIVR==0 & AgentSequence==1)%>%
  count()%>%
  group_by(RecordType) %>%
  imputeTS::na_kalman(.) %>% 
  select(RecordType, newdate, n)%>%
  as_tsibble(key = RecordType, index = newdate) %>%
  fill_gaps(n = 0, .full = end())%>%
  as_tibble()%>%
  arrange(RecordType, newdate)%>%
  ungroup()%>%
  select(RecordType, newdate, n)
TC_df%>%
  mutate(dayname = weekdays(newdate, abbreviate = TRUE))%>%
  mutate(dayname = factor(dayname,levels=c("Mon", "Tue", "Wed",
                                           "Thu","Fri","Sat","Sun")))%>%
  dplyr::filter(newdate>"2022-10-10 00:00:00" & newdate<"2022-10-23 23:00:00")%>%
  ggplot(., aes(x=newdate, y=n, fill=factor(dayname)), alpha=0.5)+
  geom_bar(stat="identity")+
  theme_bw()+
  facet_grid(rows=vars(RecordType), scales="free_y")+
  labs(x="Date", 
       y="Incoming need", 
       fill="Dayname",
       title="Actual calls between 8am and 5 pm for each day of the week")

Here you see the actual incoming telephone calls and number of e-mails received between 8 am and 5 pm for each day of the week in a range of 2 weeks.

I will again add the holidays.

holiday<-c("Kerstmis1", "Kerstmis2", "Oudjaarsdag", 
           "Nieuwjaarsdag", "Goedevrijdag", "Paasdag1", "Paasdag2",
           "Koningsdag","Bevrijdsdag","Hemelvaartsdag","Pinksteren1", "Pinkesteren2",
           "Kerstmis1", "Kerstmis2", "Oudjaarsdag",
           "Nieuwjaarsdag", "Goedevrijdag", "Paasdag1", "Paasdag2",
           "Koningsdag","Bevrijdsdag","Hemelvaartsdag","Pinksteren1", "Pinkesteren2",
           "Kerstmis1", "Kerstmis2", "Oudjaarsdag")
ds<-c(as.Date('2021-12-25'),
      as.Date('2021-12-26'),
      as.Date('2021-12-31'),
      as.Date('2022-01-01'),
      as.Date('2022-04-15'),
      as.Date('2022-04-17'),
      as.Date('2022-04-18'),
      as.Date('2022-04-27'),
      as.Date('2022-05-05'),
      as.Date('2022-05-26'),
      as.Date('2022-06-05'),
      as.Date('2022-06-06'),
      as.Date('2022-12-25'),
      as.Date('2022-12-26'),
      as.Date('2022-12-31'),
      as.Date('2023-01-01'),
      as.Date('2023-04-07'),
      as.Date('2023-04-09'),
      as.Date('2023-04-10'),
      as.Date('2023-04-27'),
      as.Date('2023-05-05'),
      as.Date('2023-05-18'),
      as.Date('2023-05-28'),
      as.Date('2023-05-29'),
      as.Date('2023-12-25'),
      as.Date('2023-12-26'),
      as.Date('2023-12-31'))
lower_window<-rep(0,length(ds))
upper_window<-rep(1,length(ds))
df_holiday <- dplyr::bind_cols(holiday, ds, lower_window, upper_window)
colnames(df_holiday)<-c("holiday", "ds", "lower_window", "upper_window")
df_holiday
# A tibble: 27 x 4
   holiday        ds         lower_window upper_window
   <chr>          <date>            <dbl>        <dbl>
 1 Kerstmis1      2021-12-25            0            1
 2 Kerstmis2      2021-12-26            0            1
 3 Oudjaarsdag    2021-12-31            0            1
 4 Nieuwjaarsdag  2022-01-01            0            1
 5 Goedevrijdag   2022-04-15            0            1
 6 Paasdag1       2022-04-17            0            1
 7 Paasdag2       2022-04-18            0            1
 8 Koningsdag     2022-04-27            0            1
 9 Bevrijdsdag    2022-05-05            0            1
10 Hemelvaartsdag 2022-05-26            0            1
11 Pinksteren1    2022-06-05            0            1
12 Pinkesteren2   2022-06-06            0            1
13 Kerstmis1      2022-12-25            0            1
14 Kerstmis2      2022-12-26            0            1
15 Oudjaarsdag    2022-12-31            0            1
16 Nieuwjaarsdag  2023-01-01            0            1
17 Goedevrijdag   2023-04-07            0            1
18 Paasdag1       2023-04-09            0            1
19 Paasdag2       2023-04-10            0            1
20 Koningsdag     2023-04-27            0            1
21 Bevrijdsdag    2023-05-05            0            1
22 Hemelvaartsdag 2023-05-18            0            1
23 Pinksteren1    2023-05-28            0            1
24 Pinkesteren2   2023-05-29            0            1
25 Kerstmis1      2023-12-25            0            1
26 Kerstmis2      2023-12-26            0            1
27 Oudjaarsdag    2023-12-31            0            1

And now, on to the real fun stuff. What you will see me do next is delete some of the standard parts of the model and replace them. As you can see I will delete the automatic daily, weekly and yearly seasonality, set the seasonality at multiplicative, set the growth at logistic, and enter the holidays.

Changing the growth from linear to logistic enables me to work with boundaries much more efficiently. By deleting the automatic components I can enter my own which I am doing by including Fourier terms — hourly, daily, weekly, monthly, quarterly, and yearly. Lets see where I end up.

m1 <- prophet::prophet(weekly.seasonality=FALSE,
                       yearly.seasonality=FALSE,
                       daily.seasonality=FALSE,
                       growth="logistic",
                       seasonality.mode = 'multiplicative',
                       holidays=df_holiday,
                       changepoint.prior.scale = 0.5)
                       #mcmc.samples=100)
m1 <- prophet::add_seasonality(m1, name='monthly', 
                               period=30.5, 
                               fourier.order=20)
m1 <- prophet::add_seasonality(m1, name='hourly', 
                               period=1/24, 
                               fourier.order=20)
m1 <- prophet::add_seasonality(m1, name='daily', 
                               period=1, 
                               fourier.order=20)
m1 <- prophet::add_seasonality(m1, name='weekly', 
                               period=7, 
                               fourier.order=20)
m1 <- prophet::add_seasonality(m1, name='yearly', 
                               period=365.25, 
                               fourier.order=20)
m1 <- prophet::add_seasonality(m1, name='quarterly', 
                               period=(365.25/4), 
                               fourier.order=20)

Next up I will set the floor and ceiling levels and add them to the model. Then I will run the model on the VOIP call time-series, and fetch my predictions from the forecasting window I have set up.

TC_df$cap <- 100
TC_df$floor <- 0
fit <- TC_df%>%
  filter(RecordType=="VOIP call")%>%
  select(newdate,n, cap, floor)%>%
  rename(ds=newdate, y=n)%>%
  select(ds,y, cap, floor)%>%
  prophet::fit.prophet(m1,.)
saveRDS(fit, file="ProphetVOIPmodel.RDS")  # Save model
fitVOIP <- readRDS(file="ProphetVOIPmodel.RDS")           # Load model
future <- prophet::make_future_dataframe(fit,periods=7*24*4, 
                                         freq= 3600, 
                                         include_history = TRUE)
future$cap <- 100
future$floor <- 0
forecast <- predict(fit, future)
plot(fit, forecast)

This looking much much better. Okay, the bottom is being neglected at all it seems, but I am seeing the patterns I want to see.

Lets look at the components.

prophet::prophet_plot_components(fit, forecast)

Quite some components included. Good to remember that all these components make up the model predictions. Hence, the time-series is first decomposed and then assessed for its ability to reproduce what it has observed. Maximum Likelihood in a nutshell.

How do my predictions look like?

forecast%>%
  dplyr::select(ds, 
         yhat, 
         yhat_lower, 
         yhat_upper)%>%
  dplyr::filter(ds>"2022-12-11")%>%
  mutate(hour = hour(ds), 
         day=day(ds))%>%
  dplyr::filter(hour>=8 & hour<=17)%>%
  ggplot(., aes(x=ds, y=yhat))+
  geom_point()+
  geom_ribbon(aes(x=ds, y=yhat, ymin=yhat_lower, ymax=yhat_upper), 
              fill="black",alpha=0.2)+
  theme_bw()

forecast%>%
  dplyr::select(ds, 
                yhat, 
                yhat_lower, 
                yhat_upper)%>%
  dplyr::filter(ds>"2022-12-31")%>%
  mutate(hour = hour(ds), 
         day=day(ds), 
         date=date(ds))%>%
  dplyr::filter(hour>=8 & hour<=17)%>%
  mutate(dayname = weekdays(ds, abbreviate = TRUE))%>%
  mutate(dayname = factor(dayname,levels=c("Mon", "Tue", "Wed",
                                           "Thu","Fri","Sat","Sun")))%>%
  
  filter(!dayname %in% c("Sat", "Sun"))%>%
  ggplot(., aes(x=hour, y=yhat, label=round(yhat)))+
  geom_point()+
  geom_ribbon(aes(x=hour, y=yhat, ymin=yhat_lower, ymax=yhat_upper, 
              fill=dayname),alpha=0.2)+
  geom_text(check_overlap = TRUE,hjust = 1, vjust=-1,nudge_x = 0.05, size=4)+
  theme_bw()+
  facet_grid(~date, scale="free_x")+
  scale_x_continuous(breaks = seq(8,17, by = 1))

And the predictions shown as a function of time and date. What I am looking for is an early steep rise in calls and then a soft decline towards the end of the day. I am also looking for that spike on Monday.

Lets zoom out a bit to see if I am truly getting that pattern.

forecast%>%
  dplyr::select(ds, 
                yhat, 
                yhat_lower, 
                yhat_upper)%>%
  dplyr::filter(ds>"2022-12-11")%>%
  mutate(hour = hour(ds))%>%
  dplyr::filter(hour>=8 & hour<=17)%>%
  summarise_by_time(ds, 
                    .by = "day", 
                    .type = "round", 
                    value = sum(yhat))%>%
  mutate(dayname = weekdays(ds, abbreviate = TRUE))%>%
  mutate(dayname = factor(dayname,levels=c("Mon", "Tue", "Wed",
                                           "Thu","Fri","Sat","Sun")))%>%
  filter(!dayname %in% c("Sat", "Sun"))%>%
  ggplot(., aes(x=ds, y=value, fill=factor(dayname)), alpha=0.5)+
  geom_bar(stat="identity")+
  theme_bw()+
  labs(x="Date", 
       y="Telephone calls", 
       fill="Dayname",
       title="Predicted telephone calls between 8am and 5 pm for each day of the week")

Lets take a look at the probabilities.

forecast%>%
  dplyr::select(ds, 
                yhat, 
                yhat_lower, 
                yhat_upper)%>%
  dplyr::filter(ds>"2022-12-12")%>%
  mutate(hour = hour(ds), 
         day=day(ds), 
         date=date(ds))%>%
  dplyr::filter(hour>=8 & hour<=17)%>%
  mutate(dayname = weekdays(ds, abbreviate = TRUE))%>%
  mutate(dayname = factor(dayname,levels=c("Mon", "Tue", "Wed",
                                           "Thu","Fri","Sat","Sun")))%>%
  filter(!dayname %in% c("Sat", "Sun"))%>%
  select(ds, yhat, yhat_lower, yhat_upper)%>%
  mutate(ds=round(ds),
         yhat=round(yhat),
         yhat_lower=round(yhat_lower), 
         yhat_upper=round(yhat_upper))%>%
  rename(date=ds, 
         VOIP_predicted=yhat, 
         VOIP_predicted_lower=yhat_lower, 
         VOIP_predicted_higher=yhat_upper)

                   date VOIP_predicted VOIP_predicted_lower VOIP_predicted_higher
1   2022-12-12 08:00:00             42                   32                    51
2   2022-12-12 09:00:00             64                   54                    75
3   2022-12-12 10:00:00             63                   53                    73
4   2022-12-12 11:00:00             59                   49                    68
5   2022-12-12 12:00:00             44                   34                    55
6   2022-12-12 13:00:00             46                   36                    55
7   2022-12-12 14:00:00             43                   34                    53
8   2022-12-12 15:00:00             42                   32                    52
9   2022-12-12 16:00:00             31                   21                    41
10  2022-12-12 17:00:00             -3                  -13                     6
11  2022-12-13 08:00:00             33                   23                    43
12  2022-12-13 09:00:00             48                   38                    58
13  2022-12-13 10:00:00             49                   39                    59

To me, the numbers look good. Since the model is probabilities, we will provide a range of numbers. But for some reason, despite a floor value of zero, the model does predict negative numbers. I will have to assess this further into the future.

To assess the validity of the model I can apply cross-validation. Here, I will ask to split the data in periods of 30 days.

TC_df.cv <- prophet::cross_validation(fit, initial = 30, period = 30, horizon = 30, units = 'days')
prophet::plot_cross_validation_metric(TC_df.cv, metric = 'mape')
TC_df.cv
# A tibble: 3,039 x 6
       y ds                     yhat yhat_lower yhat_upper cutoff             
   <dbl> <dttm>                <dbl>      <dbl>      <dbl> <dttm>             
 1    48 2021-08-23 08:00:00  14.4         6.75      21.9  2021-08-20 17:00:00
 2    77 2021-08-23 09:00:00  34.7        27.3       42.1  2021-08-20 17:00:00
 3    57 2021-08-23 10:00:00  30.0        22.8       37.0  2021-08-20 17:00:00
 4    62 2021-08-23 11:00:00  21.7        14.5       29.3  2021-08-20 17:00:00
 5    61 2021-08-23 12:00:00  -0.240      -7.48       6.93 2021-08-20 17:00:00
 6    62 2021-08-23 13:00:00  -1.50       -9.06       6.09 2021-08-20 17:00:00
 7    56 2021-08-23 14:00:00  -5.02      -12.1        2.76 2021-08-20 17:00:00
 8    58 2021-08-23 15:00:00 -14.0       -21.4       -6.63 2021-08-20 17:00:00
 9    42 2021-08-23 16:00:00 -29.9       -36.7      -22.3  2021-08-20 17:00:00
10    39 2021-08-24 08:00:00 -24.5       -32.1      -17.3  2021-08-20 17:00:00
# ... with 3,029 more rows

Looking good over time. The outliers are mostly events I have not modelled yet. Something for the future.

TC_df.p <- prophet::performance_metrics(TC_df.cv)
TC_df.p
      horizon        mse      rmse      mae      mape     mdape     smape  coverage
1    87 hours   827.3958  28.76449 20.35292 0.7142706 0.3594042 0.6124057 0.3663366
2    88 hours   884.7252  29.74433 21.15144 0.7301427 0.3838585 0.6372713 0.3557756
3    89 hours   943.8874  30.72275 21.97208 0.7452976 0.3965911 0.6628596 0.3392739
4    90 hours   991.3885  31.48632 22.62120 0.7573581 0.3992891 0.6833342 0.3372937
5    91 hours  1044.0739  32.31213 23.27287 0.7748281 0.4054236 0.7074297 0.3359736
6    92 hours  1085.8181  32.95175 23.90657 0.7899180 0.4126654 0.7288895 0.3267327
7    93 hours  1129.4854  33.60782 24.44831 0.8046231 0.4274603 0.7466365 0.3227723
8    94 hours  1169.2230  34.19390 25.04196 0.8151764 0.4335314 0.7568633 0.3156316
9    95 hours  1186.1815  34.44099 25.21167 0.8302762 0.4378025 0.7647722 0.3165317
10   96 hours  1183.3540  34.39991 25.19690 0.8937962 0.4378025 0.7689431 0.3156316
11  111 hours  1241.9208  35.24090 25.60396 0.9087479 0.4485662 0.7787953 0.3180318
12  112 hours  1298.6121  36.03626 26.08307 0.9197898 0.4485662 0.7870496 0.3162316
13  113 hours  1326.8614  36.42611 26.33218 0.9237472 0.4485662 0.7881559 0.3129313
14  114 hours  1373.9662  37.06705 26.73836 0.9302271 0.4417954 0.7913998 0.3096310
15  115 hours  1427.0944  37.77690 27.21232 0.9438990 0.4649363 0.7978453 0.3039304
16  116 hours  1473.2762  38.38328 27.65111 0.9576387 0.4724837 0.8039432 0.2955296
17  117 hours  1503.6625  38.77709 27.89451 0.9590604 0.4724837 0.8029143 0.2916292
18  118 hours  1526.2041  39.06666 27.97210 0.8396388 0.4511389 0.7897167 0.2948295
19  119 hours  1545.1412  39.30828 27.96781 0.8489682 0.4500697 0.7915213 0.2978548
20  120 hours  1577.3210  39.71550 28.28425 1.3285633 0.4500697 0.8007789 0.2962046
21  135 hours  1664.4476  40.79764 28.86563 1.3454675 0.4649363 0.8098595 0.2937294
22  136 hours  1722.3524  41.50123 29.33171 1.3693806 0.4516746 0.8138840 0.2931793
23  137 hours  1761.7773  41.97353 29.52384 1.3710233 0.4511389 0.8084394 0.2915292
24  138 hours  1837.7364  42.86883 29.98473 1.3796956 0.4385938 0.8036610 0.2838284
25  139 hours  1909.4537  43.69730 30.47415 1.3983215 0.4385938 0.8046136 0.2772277
26  140 hours  1973.7674  44.42710 30.87739 1.4144755 0.4306182 0.8027825 0.2717272
27  141 hours  2050.6867  45.28451 31.31982 1.4234553 0.4306182 0.8006392 0.2698020
28  142 hours  2104.4469  45.87425 31.62569 1.4238621 0.4258125 0.7939464 0.2706271
29  143 hours  2180.6383  46.69730 31.98080 1.3437806 0.4170105 0.7895738 0.2764026
30  144 hours  2205.8794  46.96679 32.14368 1.7070054 0.4211839 0.7964796 0.2780528
31  159 hours  2377.8927  48.76364 32.88574 1.7325492 0.4170105 0.7945434 0.2849285
32  160 hours  2532.7136  50.32607 33.66352 1.7564765 0.4170105 0.7919632 0.2945545
33  161 hours  2668.3920  51.65648 34.42589 1.8835151 0.4170105 0.7908198 0.3003300
34  162 hours  2802.9861  52.94323 35.24607 1.9811052 0.4170105 0.7886301 0.2926293
35  163 hours  2995.5727  54.73183 36.35064 2.0170405 0.4211839 0.7901189 0.2871287
36  164 hours  3224.9815  56.78892 37.56338 2.0377829 0.4258125 0.7919723 0.2882288
37  165 hours  3388.1239  58.20759 38.57397 2.1084925 0.4336581 0.7943699 0.2816282
38  166 hours  3631.8703  60.26500 39.84352 2.1232605 0.4322039 0.7962240 0.2698020
39  167 hours  3851.3188  62.05899 41.07109 2.0939748 0.4336581 0.7979080 0.2596260
40  183 hours  4076.1588  63.84480 42.18438 2.1616346 0.4322039 0.8018336 0.2541254
41  184 hours  4312.0095  65.66589 43.42222 2.1938510 0.4336581 0.8067529 0.2486249
42  185 hours  4633.1511  68.06725 45.10975 2.2234641 0.4487351 0.8186872 0.2355236
43  186 hours  4915.7172  70.11218 46.48087 2.2466009 0.4582281 0.8240190 0.2359736
44  187 hours  5224.6978  72.28207 47.85008 2.2728217 0.4582281 0.8288225 0.2343234

We can see that the error increases as the horizon increases. Now, the MAPE (mean absolute percentage error) is not really a good value to use for demand data, so in that sense we can go for RMSE (root mean squared error) or Mean Absolute Error (mae). A mae of 20 means that my forecast for a particular time-period is an absolute 20 points off. That is substantial, but depends mostly on how far ahead you wish to forecast which is 87 hours ahead here.

In essence, the Prophet model gives me a good basis for a first model. I did not compare it to a baseline model or anything like that which is future work once I start building the ML Ops part. It is here that the model will run automatically digesting data, producing forecasts and checking accuracy.

For now, I am happy to have build a model that is able to deal with the intermittent data adequately enough and can detect the patterns inherent in this time-series.

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

medium.com

Understanding the patterns of a demand question (part 2)

Applying the model I did not want to apply

Mlearning.ai Submission Suggestions

How to become a writer on Mlearning.ai

Written by Dr. Marc Jacobs