Unlock the NeuralProphet potential: hyperparameter tuning

A guide through the optimization of its hyperparameters for more accurate predictions: from trend and seasonality related, to AR-Net ones

Lavinia Guadagnolo
Eni digiTALKS
11 min readSep 12, 2023

--

Figure 1 — Image from Adobe Stock

Finding the optimal hyperparameters for your time series forecasting model can be a daunting task: it is like searching for an elusive pot of gold at the end of a rainbow. You need to know exactly what each parameter does, where to set the values, and how they interact with each other in order to make your model as accurate as possible. Even if you are well-versed in the ins and outs of machine learning, tuning a NeuralProphet model can be particularly tricky due to its complex network of hyperparameters. In this article we will walk you through the process of optimizing its hyperparameters, so you can reach the holy grail of accurate time series predictions.

If you are looking for a deeper guide on the theoretical aspects, have a look at our previous article.

WARNING: the library is changing and there may be differences from version to version. In the following paragraphs we will show how the most important parameters affect the model and provide some guidelines about their use. We will focus on the experiments we carried out in version 0.3.2. Please read the official documentation [1] for an exhaustive and updated understanding of all of them.

Trend related parameters

We will start with the most well-known component: the trend, which represents the long-term variation in time series.

Figure 2 — Example of how the changing trend is built starting from the changepoints

The key concept to control how NeuralProphet models trend is changepoints, that is the dotted lines in Figure 2. There are two main parameters related to changepoints that can be tuned:

  • n_changepoints: number of changepoints selected along the series for the trend
  • changepoint_range: range of training data used to fit the trend

The default value for changepoint_range is 0.8 and the default value for n_changepoint is 5. This means that 5 changepoints will be distributed at equal distance in the first 80% of the training data set.

changepoint_range

Increasing the changepoint_range, means that more data will be considered to model trend, that is data closer to the time in which we are starting to forecast. This increases trend variability and might lead to overfitting. On the other hand, a low value for changepoint_range might lead to underfitting, since it is likely that the latest trend variations are not caught by the model, and the trend will be modeled on old data (which may not mirror the more recent ones).

n_changepoints

Similarly, increasing the number of changepoints increases trend variability, as there will be more trend variations, and this also might lead to overfitting. If the number of changepoints is too high it might also affect how seasonality is interpreted as a side effect, as the model may misinterpret periodic fluctuations as trend variations. On the other hand, setting a low value for n_changepoints might imply missing important variations and might lead to underfitting. Therefore, it is essential to decompose the series in advance and observe trend, to have an idea of the possible values for n_changepoints.

Changepoints can also be set manually, specifying the dates in which they occur. This feature is particularly useful when there are known variations in the trend pattern.

trend_reg

In addition to these two main parameters, it is possible to vary the trend_reg parameter. This regularizes changepoint’s growth rate, so regularizes trend rate changes.

In the following pictures we show trend_reg parameter impact. In Figure 3 in the top representation trend_reg is low, in the representation below is high. As you can see, in the second picture, the number of changepoints is reduced and trend rate change range is smaller. Just to be precise, the changepoints do not disappear, but the penalty term imposed shrinks them to 0 making their impact irrelevant.

Figure 3 — NeuralProphet’s trend component with different parameters. Top figures: representation of trend and changepoints when trend_reg = 0. Bottom figures: representation of trend and changepoints when trend_reg > 10.

After having played a bit with all the trend-related parameters, our final advice is to keep the number of changepoints fixed and let the model choose which ones to keep thanks to trend_reg parameter (which can be tuned to find the best value).

Seasonality related parameters

The idea behind seasonality modelling in NeuralProphet is the Fourier series. When training the model, it is possible to choose the periodicities, that is which seasonal components have to be modelled, and seasonality’s mode.

Figure 4 — Seasonality hyperparameter tuning.

Seasonality components can either be set to a boolean value or to the number of Fourier terms of the respective seasonalities. If the series does not include certain seasonalities, their relative parameter should be set to False, otherwise the model might be distorted by their presence.

daily_seasonality, weekly_seasonality, yearly_seasonality

The default values are daily_seasonality =6, weekly_seasonality =4, yearly_seasonality =6. Sometimes these values do not reflect the seasonality of our series, our daily seasonality might be smoother and our yearly seasonality sharper, fortunately it is possible to change the Fourier order to meet specific needs.

In the following pictures we can see how reducing by one unit the order of yearly seasonality impacts seasonality interpretation.

Figure 5 — NeuralProphet’s yearly seasonality component. Fourier order = 6.
Figure 6 –NeuralProphet’s yearly seasonality component. Fourier order = 5.

The plot in figure 5 represents yearly seasonality when set to 6 and the one in figure 7 represents yearly seasonality when set to 5. The higher the Fourier order, the higher the model’s complexity.

seasonality_reg

Similarly with the trend, a parameter for regularization can be set also for seasonality, this parameter is named seasonality_reg and the default value is 0. “Small values (0.1–1) allow to fit large seasonal fluctuations whereas large values in the range 1–100 impose a heavier penalty on the Fourier coefficients and thus dampens the seasonality”. Let’s find out its impact in an example:

Figure 7 — NeuralProphet’s yearly seasonality component. Fourier order = 6 and seasonality_reg = 0.
Figure 8– NeuralProphet’s yearly seasonality component. Fourier order = 6 and seasonality_reg > 50.

In figure 7 yearly_seasonality is set to 6 and seasonality_reg is set to 0, while figure 8 yearly_seasonality is set to 6 and seasonality_reg is greater than 50. As you can see, in the second case amplitude is reduced (pay attention, in the first case the y axis ranges from -2000 to 2000, while in second one from -1000 to 1000). The impact of this parameter is quite strong, and difficult to control, since it ranges from 0 to 100 with weird dynamics. Therefore, our advice is to try different possible values for yearly, daily, and weekly seasonality and then try to model seasonality_reg, letting it vary in small ranges.

AR related parameters

There are 4 main parameters concerning autoregression: n_lags, ar_reg, num_hidden_layers and d_hidden.

n_lags

The first one is called n_lags and simply indicates the order of the autoregression, the number of lags we want the model to take into account. As explained in the first part of the article, thanks to sparseness ability it is possible to choose a higher order and let the model learn which ones are not useful. However, NeuralProphet’s authors suggest choosing a value equal to or greater than the forecast horizon. This means that if we have daily data and we want to forecast the next 7 days, we need to look at least at the past 7 days.

ar_reg

Strictly related, the next parameter is called ar_reg and it adjusts the strength of sparseness (fig 9). If you have selected a high value for n_lags (because you are estimating long-range dependencies or because you are forecasting far ahead in the future), this parameter could help in reducing computational cost and remove unimportant lags. Similarly, to other regularization parameters, you may choose to tune it, rather than fix it.

Figure 9 — Example of Sparse AR-net: the regularization term reduces the order from 7 to 3

num_hidden_layers, d_hidden

The last two parameters concern the deep AR configuration. These parameters are: num_hidden_layers and d_hidden , the first one allows us to choose how many hidden layers we want the second one allows us to define how many neurons in each layer. It is possible to plug these parameters into a tuning pipeline to find the best values. However, the risk of over-complicated models is high. In figure 10it is possible to appreciate how the number of neurons in the hidden layer affect the dynamics learned by the model. It is advisable to implement a deep AR only if you know that the dynamics are not linear.

Figure 10 — Example of how the AR dynamics learned change when increasing the number of neurons in the hidden layer — Plots made by the authors.

Model related parameters

There are five main parameters which relate to the model: loss_func, optimizer, batch_size, epochs, and learning_rate. Thanks to the work of the NeuralProphet guys, only the first two are strictly necessary.

The loss_func argument, in which you specify the loss you want to use, either a string referring to a default loss or a custom one as shown at the beginning of the first part of the article. And the optimizer, which could be a standard SGD (Stochastic Gradient Descent) or AdamW. The latter works very well in converging toward a minimum, but you should pay attention to overfitting when choosing it.

The parameters batch_size and epochs are not essential because authors implemented some heuristics to automatically select good values (Figure 11 left side). The automatic values are a good choice for a quick test of the model or as a benchmark. However, we suggest experimenting with them. For instance, we had to use a wider batch size compared to the one selected by the heuristic to cope with overfitting issues.

Concerning the learning_rate, authors implemented a routine test which aims to find the best learning rate (Figure 11 right side). In our experience it worked quite well, the only drawback is that it works for default loss function only, thus if you are implementing a custom loss, you cannot leverage such learning rate range test. In that case we took advantage of Bayesian Optimization to find an appropriate value.

Figure 11. Model parameters

Further parameters

There are plenty of arguments and parameters, and, as stated at the beginning of this section, the library is evolving, therefore, again, you should refer to the official documentation.

The first set of parameters: newer_samples_weight and newer_sample_start allows the users to skew the model toward more recent observations. Indeed, with them we are able to choose when such new observations begin and how much weight assign to them.

Then, there are some parameters about data processing (impute_missing, impute_linear, impute_rolling, drop_missing). These are meant to choose how to cope with missing data, whether to drop them or impute them, and how. While this capability could seem particularly useful, we would like to warn you of the tradeoff in terms of control. Indeed, if you opt for an impute missing pipeline, you will lose control over those missing data. In our case, missing data means a failure in the data pipeline, therefore we disabled such imputing so that in case of missing data the algorithm will stop and raise an error, and we can realize that something went wrong. There is also a very useful parameter for normalization: normalize. NeuralProphet provides different normalization methods, from min-max to 0-centered. Further it is possible to normalize both the response variable and the regressors.

Last but not least, global model parameters: global_normalization, global_time_normalization, unknown_data_notmalization. One of the greatest advantages of NeuralProphet is the ability to implement a global model. This configuration allows to model several similar time series with one common model. Each time series can have its own regressors, but generally trend and seasonality are shared. However, with the latest versions it is also possible to model those components independently. Besides the clear computational and time benefit, such configuration allows to forecast new or incomplete series thanks to parameters learned on the other ones. Furthermore, global models have been shown to provide better generalization and less overfitting compared to multiple local models. This is true also when considering complex models, which if adopted locally would overfit. Interestingly, it has been shown that it always exists a global model performing at least as well as local models, regardless the relatedness of the series [4]. Nonetheless, there are many other factors which can affect this statement, as shown in [5]. Finally, by considering all the time series at once, the training size increased drastically.

From the point of view of the users, they need to specify the normalization configuration. It could be local or global. In global normalization, the parameters are unique and learned in all the series. In the case of new/incomplete series, such parameters will be applied. With local normalization each time series has its own parameters. In such cases it is still possible to forecast new series by setting the unknown_data_normalization argument to True. This way regardless of the local parameters, global data will be computed and used for such series.

Conclusion

Image from Adobe Stock

This guide has provided an overview of how to tune Neural Prophet hyperparameters for optimization. With the help of this article, data scientists can now perform hyperparameter tuning with a much better understanding. Also, we included some insights from our experiences to provide you with advice that are the results of months of experimentation (and failures).

To conclude, with the ability to tune hyperparameters and the added flexibility of the AR-Net component, the possibilities for this model are endless, from easy ones with all default parameters to highly tuned ones. However, it is important to keep in mind that the model is constantly evolving and improving, so staying up to date with the latest developments is essential. Overall, NeuralProphet is a powerful tool that is definitely worth exploring for anyone interested in time-series forecasting. Indeed, our journey began with an easy default model just to explore the tool, and now it is a finely set up model in production.

This article has been written with the precious collaboration of Riccardo Tambone

References

[1] O. Triebe, H. Hewamalage, P. Pilyugina, N. Laptev, C. Bergmeir and R. Rajagopal, “NeuralProphet: Explainable Forecasting at Scale,” arXiv, 2021.

[2] “NeuralProphet Documentation,” [Online]. Available: https://neuralprophet.com/.

[3] O. Triebe, N. Laptev and R. Rajagopal, “AR-Net: A simple Auto-Regressive Neural Network for time-series,” arXiv, 2019.

[4] P. Montero-Manso and R. J. Hyndman, “Principles and algorithms for forecasting groups of time series: Locality and globality,” International Journal of Forecasting, vol. 37, no. 4, pp. 1632–1653, 2021.

[5] H. Hewamalage, C. Bergmeir and K. Bandara, “Global models for time series forecasting: A Simulation study,” Pattern Recognition, vol. 124, 2022.

--

--