Post-Pandemic Flu Forecasting

By Guzal Bulatova

Guzal Bulatova
Trusted Data Science @ Haleon
9 min readNov 13, 2023

--

A graph representing number of Influenza-Like-Illness (ILI) cases per 1000 outpatients in US between 2012 and 2023. Until 2020 there are very prominent U-shaped seasonal patterns with peaks in the beginning of each year and lows in the middle. After 2020 we observe multiple peaks each year, present outside of typical winter months flu season.
Number of Influenza-Like Illness (ILI) cases per 1000 outpatients in US between 2012 and 2023. Data source: WHO

Predicting flu patterns is essential for business planning of consumer healthcare companies, like Haleon. By accurately forecasting upcoming flu seasons, Haleon can ensure that we have enough over-the-counter flu medications, pain relievers, and cough medications in stock to meet demand and better serve our customers.

The grey swan that was the COVID-19 pandemic affected the whole planet: the way we meet, work, shop. This has had an impact on many different time-series including production, transportation and sales to name a few.

Flu incidence patterns have been affected as well, showing significant deviations in trend, cycle and, most prominently, seasonal patterns during 2020–2022. Even though the patterns have now recovered back to pre-pandemic rhythm, we have this highly irregular historical data that must be accounted for when forecasts are created. Dealing with sporadic outliers can be troublesome, and here we have an anomaly period that lasted approximately 3 years.

In this article we’re taking a closer look at an affected time series — FluID’s Influenza-Like Illness (ILI) incidence in the US [1]. We’re assessing different approaches for dealing with COVID-19 period and we want to compare these with a benchmark approach, which is to leave the pandemic-affected period as is. We’ll study their effects on forecast accuracy and summarise our observations.

Flu time series decomposition

A graph representing number of ILI cases per 100 thousand people in US between 2012 and 2023. Until 2020 there are very prominent U-shaped seasonal patterns with peaks in the beginning of each year and lows in the middle. After 2020 this regularity disappears.
Number of ILI cases per 100 thousand people in US between 2012 and 2023.

The dataset in focus represents reported ILI cases in US (country-wide), weekly, between January 2012 until September 2023, population-adjusted (per 100 thousand people). Even without decomposition analysis the seasonal patterns are prominent with clear peaks in the first two months of every year during the pre-pandemic period. The same peaks do not occur at the start of 2021. Instead they can be seen at the end of 2021 following lockdown and travel policy updates.

Below is the graph with the same time series seasonal plot, where each year is on the same x-axis representing the number of the week in the respective year. The pre-pandemic 2017–2019 (orange) have similar pattern, with peaks in the beginning and end of the year. Pandemic years (purple, blue and light blue), each follow a unique pattern that isn’t like any other year. 2023 in red is following a unique pattern, however, a bit more similar to pre-pandemic than other years:

Graph with number of ILI cases. The orange lines — pre-pandemic 2017–2019 stay very close to each other, with peaks in the beginning and end of the year; purple, blue and light blue are pandemic years, each having a unique pattern that isn’t like any other year; and lastly 2023 in red is following again a unique pattern, however a bit more similar to pre-pandemic than other years.
Seasonal plot of ILI cases, period 2017–2023.

Let’s check the randomness in our dataset. For that we’ll use Autocorrelation function (ACF) graphs, which depict correlations at varying time lags. The correlation values are between -1 and 1, where -1 is strong negative correlation, 1 is strong positive correlation and 0 is no correlation.

Below we have ACF plots, where autocorrelations are computed for number of ILI cases at lags 0 to 52. The ACF for the whole time period looks quite promising: here correlation values are significantly non-zero for many time-lag separations signifying that data is non-random. Lags 1–8 are > 0.5:

A graph representing autocorrelations for number of ILI cases, where values for all lags are positive and 24 lags have value > 0.25.
Autocorrelation function (ACF) plot for number of ILI cases between 2012–2023.

However, if we look at the time periods separately we can see in the ACF for pre-pandemic period a more regular, sinusoidal pattern with not only positive, but also negative peaks repeating within yearly period:

A graph representing autocorrelations for number of ILI cases pre-pandemic, where values for lags are > 0.5 for lags 1–6 and 48–52, and <0.25 for lags 19–32, showing both positive and negative autocorrelation
ACF plot for 2012–2019 (pre-pandemic).

Adding 2023 to the pre-pandemic period flattens the negative peak, but it is still present and the positive peaks stay largely unaffected:

A graph representing autocorrelations for number of ILI cases pre- and post-pandemic, showing both positive and negative autocorrelation, the negative autocorrelation values are dampened compared to pre-pandemic only graph.
ACF plot for 2012–2019 (pre-pandemic) + 2023 (post-pandemic).

And separate 2020–2022 shows very different picture, with absolute majority of values near zero for 33 out of 52 time-lag separations, ascertaining significantly higher randomness than in the periods before and after:

A graph representing autocorrelations for number of ILI cases during pandemic, where values for lags are > 0.25 for lags 1–8 and stay close to zero for the rest of the lags, indicating high randomness.
ACF plot for 2020–2022 (pandemic)

Decomposing the 20–22 period with Seasonal and Trend decomposition using Loess (STL) we take a closer look at key time series components each representing an underlying pattern category:

  • Seasonal — representing effects of seasonal factors such as the time of the year,
  • Trend — long-term increase or decrease in the data and
  • Residuals — containing anything else in the time series.

On the right graph with decomposed pandemic period we see the seasonal component absent and the residuals values within the range [-20, 8], although again before 2020 (on the left) it was quite strong, with residuals fluctuating within range [-10, 10]:

Decomposed ILI cases time series, pre-pandemic period (left) and pandemic (right).

We are going to assess the forecasts for 2023 in three scenarios:

  1. using the data “as is” with basic normalisation only,
  2. excluding the 2020–2022 period,
  3. smoothing the pandemic period.

For our purposes of assessing general effects on forecast quality and for simplicity, we’re making point forecasts using “statistical” models: seasonal Naïve, Holt-Winters Exponential Smoothing and Prophet.

Approach 1: Include the pandemic period

Simply replacing outliers without thinking about why they have occurred is a dangerous practice. They may provide useful information about the process that produced the data, and which should be taken into account when forecasting [2].

Another reason for including the pandemic period, besides using it as a benchmark, is that it is straightforward. The less we change the data the truer a reflection of the real world phenomenon it is.

All forecasts are for 39 steps (weeks) ahead, out-of-sample.

We are splitting the data into training and testing sets using sliding window approach. Let’s say we have a set of 100 observations. An example of sliding window of training size 10, testing size 5 and step = 1:

  • first split: 10 points [1, 10] as training set and 5 points [11, 15] as testing set
  • second split: we move one step forward, and use [2, 11] to train and [12, 16] to test,
  • and so on, until the last split, where [86, 95] is our training and [96, 100] is our test.

Here’s a schematic representation:

A visualisation for splitting testing and training time series data with sliding window approach.
Splitting dataset using sliding window method.

To have baseline for the three models we did backtesting — checking how models predict before 2020. We have generated 75 sliding window splits with step=1 to assess the models’ performance in general on non-random, highly seasonal period that was until pandemic started:

Graph with visualising the Number of ILI cases over time, split into training set and testing set. The predictions by three models are plotted together with testing period’s actual values. All forecasts are very close to the actual values.
Predictions for the last split of the backtesting period against actual values (purple).

We calculate the Mean Absolute Error (MAE) and Root-Mean-Square Error (RMSE) for each forecast and then average over the 75 splits. Holt-Winters Exponential Smoothing (Holt-Winters ETS) is performing best with 3.7 MAE, followed by Seasonal Naïve (sNaïve) and Prophet. Given the values of the forecasted variable are within [0, 112], the absolute errors of 3.7–6.1 are quite low:

A table with model names as row index and the mean errors over backtesting period — MAE and RMSE - as columns.
Model comparison using MAE and RMSE.

For testing our three approaches our focus is on the last 39 weeks of the whole set, i.e., the weeks of 2023.

Now we use the whole dataset. To test the first approach, leaving the pandemic period in, we have generated, again, 75 splits with step=1 and forecasting horizon of 39.

Graph with visualising the Number of ILI cases over time, split into training set and testing set. The predictions for 2023 by three models are plotted together with testing period’s actual values. Forecasts are fairly close to the actual values, the closest being Holt-Winters ETS model’s predictions.
Predicted 2023 against actual values (purple), training set without any adjustments for pandemic period.

Considering the observed values of ILI cases during the preceding period, we are able to generate rather accurate forecasts for 2023 without any additional adjustments:

A table with model names as row index and the mean errors over backtesting period — MAE and RMSE — as columns.
Table with models’ accuracy in MAE and RMSE over period of 2023 and the mean over 75 splits

We observe a slight increase in errors compared to backtesting period. sNaïve performing best at 6.8 MAE / 9.25 RMSE on the 2023 period, followed by Holt-Winters ETS and Prophet.

Approach 2: Exclude the pandemic period

On the other hand, if we’re assuming that the outliers will not occur in the future and genuinely are errors, then we can modify the data to account for it, the most radical approach being excluding the 2020–2022 period altogether. This is assuming “the world is back to pre-pandemic normal” scenario.

Graph with visualising the Number of ILI cases over time, split into training set and testing set. The predictions for 2023 by three models are plotted together with testing period’s actual values. Forecasts are not very close to the actual values, the least deviating are sNaive model’s predictions.
Predicted 2023 against actual values (purple), training set excluding pandemic period completely.

Excluding the pandemic, however, doesn’t improve the accuracy of the predictions. If we compare the forecast plots we can observe the very prominent peak in the first months of the year, that the models learned from the pre-pandemic training set. See February 2023 on the plot above: all three modes predict a peak whereas actual values (purple) did have a small peak there, but generally were following a declining trend.

A table with model names as row index and the mean errors over backtesting period — MAE and RMSE — as columns.
Model comparison using MAE and RMSE.

The best model again here is seasonal Naive with a MAE of 11.7, which is nearly twice as high as the 6.8 which was observed in the training set including the pandemic period. Doesn’t seem that it’s a good idea to simply ignore the COVID-19 period.

Approach 3: Adjust the data

Given the extent of the period, we’re applying a variance reduction and MinMax normalisation to the values. The variance is reduced by splitting set into yearly subsets and multiplying each element in the subset by (100*1/mean(subset): x * 100 / mean(subset) for x in subset.

Although the outlier years are still noticeable, their extent has been significantly reduced.

Graph with the ILI cases time series before and after variance reduction. The adjusted series values are within [0, 100] compared to previous[0, 112] interval. The yearly peaks are at more even amplitude than initially.
Original ILI cases (green) and the same series after variance reduction and MinMax normalisation (blue).

This approach proves to be better than the complete exclusion of 2020–2022 period, and for Holt-Winters ETS and Prophet this is their peak performance within our scope. However, these forecasts don’t beat the MAE 6.8 set by sNaïve on simple population adjusted data:

Graph with visualising the Number of ILI cases over time, split into training set and testing set. The predictions for 2023 by three models are plotted together with testing period’s actual values. Forecasts are not fairly close to the actual values, the least deviating are Prophet model’s predictions.
Predicted 2023 against actual values (purple), training set is normalised and variance adjusted.
A table with model names as row index and the mean errors over backtesting period — MAE and RMSE — as columns.
Model comparison using MAE and RMSE.

Conclusion

In this exercise we looked into the influence of inclusion, exclusion and adjustment for COVID-19 affected period for forecasting Influenza-Like Illness incidence post-pandemic. The hypothesis was that addressing the period will improve the forecasting accuracy even with the standard time series models like ETS, but we were proved to be wrong.

While the difference was not tremendous, still, the best forecasts we achieved were with the simplest approach and the simplest method.

Philosophically, when dealing with outliers, we first need to check our base assumption: whether the process(es) causing these deviations are going to reoccur and/or are affecting the future. We tested the assumption that they didn’t affect the future, and, given our observations, we’re more likely to be wrong than right. That is sensible in hindsight, as the way people interact with each other has changed: people are working from home, choosing to stay in when they feel sick, they socialise less. The COVID-19 pandemic might not be reoccurring now, but it is still affecting the flu patterns.

When generalising this conclusion to similarly affected datasets the best option is still, of course, to check the data.

Future work:

In the scope of this exercise we haven’t studied the data using machine learning models and we are not claiming that they will show the same results. It is possible that an ML model trained on pre-pandemic data alone could forecast post-pandemic period better than the one trained on the whole set. This could be an interesting experiment to conduct.

References:

  1. WHO’s Influenza Surveillance Report
  2. “Forecasting: Principles and Practice” by R.J.Hyndman, G.Athanasopoulos

--

--