Time series parameters finding using Prophet and Optuna bayesian optimization

A smart way for training time series models using prophet and optuna libraries for creating additive models and finding their optimal parameters through a bayesian approach.

Published in

spikelab

6 min readFeb 20, 2021

Time series modeling

Time series is a big topic in machine learning space and exists in practically all the industries, and, all hands down, is a problem that every data scientist will face on their career..

Besides the basic of the task, there are a lot of ways of facing and finding models for describing the underlying dynamic of a time series. For example, there’s a lot of classical models, such as space state models (Arima, Armas, etc…), and the use of trees based algorithms as well. In this article, we are going to discuss the use of one tool which is one of the easiest and yet powerful ways of working with time series, prophet, a Facebook library which:

Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.

For getting the best model that prophet can offer, we need to adjust the parameters of the model. The classic way of finding the best combination of parameters is doing a grid search(random search or a cartesian search), but this can be too time-expensive, specially when we are validating the models using a large number of folds during cross-validation.

An excellent alternative for saving some time and doing a smarter search in the parameters space is doing a bayesian search, which will focus on the areas of parameter’s space that have a better value in our objective function. There are some libraries which can do the bayesian search, but we will focus in optuna, a really good and easy-to-use library. Among the key features of optuna, these are the main ones:

Eager search space: Automated search for optimal hyperparameters using Python conditionals, loops, and syntax.
State-of-art algorithms: Efficiently search large spaces and prune unpromising trials for faster results.
Easy parallelization: Parallelize hyperparameter searches over multiple threads or processes without modifying code.

One of the things I liked the most, is that we need only a few lines of code for getting optuna to work:

Example of using optuna for finding the minima of the (x-2)**2 function

In the code above we see how easy is to implement optuna for a simple optimization problem, and is needed:

An objective function to be optimized (default is minimizing it).
A distribution of the variable we are searching (from which we sample). It could be a continuous or discrete distribution.
Create a study and invoke the optimize method. As the number of trials increases, the solution will be better.

It sounds very easy, so we will implement it in a prophet parameters search.

Lets model it

So first we need a time series to work with, and as an example we will work with the maximum temperature in Valdivia, the city where I live.

To get the data we use the Nasa API which gives temperature information between two given dates. We are going to use the temperature from 2015 to the present day.

And it looks like:

Daily maximum temperature from 2015 to 2021 in Valdivia, downloaded from Nasa API

How we validate

The validation period has a greater importance when dealing with time series. Model performance and optimal parameters depends exclusively in the election of validation and training periods. So what we want is having a robust way of finding the optimal parameters, diminishing the risk of a bad chosen validation period. In order to do this, the error measure should be calculated using time series cross validation. This, and not random split, is the correct way of training a time series in order to prevent data leakage from future data.

Idea behind cross validation in time series using a 3 fold split.

Fortunately prophet has already an implementation for using cross validation which can be parallelized (see prophet documentation for finding different ways of parallelizing).

The election of the horizon of each validation period depends on the problem we are solving. This needs to have a correspondency to the problem we are willing to solve through the forecast. Lets suppose that we need a 1-week forecast, and we are going to use a 4-fold cross validation (we left the final 4 weeks for validation).

Code looks like this:

Example of prophet’s cross validation, using a 7 days horizon and 4-fold cross validation

In the code above, first we need to adjust a prophet model to our data, and then we need to specify the forecast horizon (horizon) of the cross validation, and then, optionally, the size of the initial training period (initial) and the spacing between cutoff dates (period).

Once we have trained the models, we can get the performance_metrics, which calculate different metrics from the cross validation output, like mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), mean absolute percent error (MAPE), median absolute percent error (MDAPE) and coverage of the yhat_lower and yhat_upper estimates. These are computed on a rolling window of the predictions in df_cv after sorting by horizon (ds minus cutoff). So we already have a robust way for measuring the error of our model.

Now we need to see which parameters can be tunned. Accordingly to prophet documentation, if we want to tune parameters, the first ones to be tunned should be:

changepoint_prior_scale: This is probably the most impactful parameter. It determines the flexibility of the trend and, in particular, how much the trend changes at the trend changepoints.
seasonality_prior_scale: This parameter controls the flexibility of the seasonality.
holidays_prior_scale: This controls flexibility to fit holiday effects.
seasonality_mode: Options are ['additive', 'multiplicative'].

For this example, as temperature is not affected by holidays, our modeling won’t include them.

Then, we need to specify the support and the distribution for each parameter, so we are going to consider the following:

As seasonality mode can take only two values, we are going to sample from an integer distribution which support will be {0,1}, which represent the election of the additive or multiplicative mode.

Wrapping prophet and optuna

So we already have everything needed for finding the best parameters of our prophet model. The objective function to be optimized for optuna will be the RMSE over the 4 folds of 7 days each, and in order to be resource-efficient, the cross validation will be parallelized.

Once we have defined our function to be minimized, we need to create the study and define the number of trials we are going to run. Once the study has finished, we have found the parameters that will minimize the RMSE over the folds, so now we can train a final model with all our data in order to start making future predictions.

Final thoughts

Prophet is an easy and quick library for time series prediction, which combined with optuna will unleash its potential. Although in this article we don’t report the error, if we want to reduce it more maybe we need to do some additional tweaks, like using additional regressors or maybe changing and tunning additional prophet’s parameter.

Optuna is a very useful framework that allows tuning parameters not only for prophet, but for any other model. This is specially important when the searching of parameters comes along with the use of a special metric (different from any usual metric) for evaluating the model performance.

Please feel free to make any comments or suggestions.