# Time Series Forecasting Using Past and Future External Data with Darts

Building models that are able to capture external data is often a key aspect of time series forecasting projects. For instance:

- Recently-observed activity on an e-commerce website can help predict future sales.
- Observed rainfalls and known weather forecasts can help to predict hydro and solar electricity production.
- Making the model aware of up-coming holidays can help sales forecasting.
- Knowing that some intervention is ongoing on a system can be helpful for correcting forecasting / outage detection.
- etc…

In fact, more often than not, strictly relying on the history of a time series to predict its future is missing a lot of valuable information.

Darts is an open source Python library whose primary goal is to smoothen the time series forecasting experience in Python. Out of the box it provides a variety of models, from ARIMA to deep learning models, which can all be used in a similar straightforward way using `fit()`

and `predict()`

. In this post, we’ll show how Darts can be used to easily take “covariates” — other time series providing useful information — into account. First, let us quickly explain a subtle-yet-important distinction between “past” and “future” covariates.

# Past and Future Covariates

We define two kinds of time series which can be used for forecasting:

*Past covariates*are time series whose past values are known at prediction time. Those series often contain values that have to be observed to be known.*Future covariates*are time series whose future values are known at prediction time. More precisely, for a prediction made at time*t*for a forecast horizon*n*, the values at times*t+1, …, t+n*are known. Often, the past values (for times*t-k, t-k+1, …, t*for some lookback window*k*) of future covariates are known as well. Future covariates series contain for instance calendar informations or weather forecasts.

Note that in general future covariates can also be used as past covariates, whereas the reverse is not true.

## Past and Future Covariates in Darts

Darts differentiates models that make use of past and future covariates.

*Past covariates models*: The`fit()`

and`predict()`

methods of these models accept only a`past_covariates`

argument (specifying one or a sequence of`TimeSeries`

). These models will look only at past values of the covariate series when making a prediction. Past covariates models:`BlockRNNModel`

,`NBEATSModel`

,`TCNModel`

,`TransformerModel`

,`RegressionModel`

(incl.`LinearRegressionModel`

and`RandomForest)`

.

*Future covariates**models*: The`fit()`

and`predict()`

methods of these models accept only a`future_covariates`

argument. The training procedure will look at future values of the covariates (and possibly at historic values too), and future values will have to be provided at prediction time. Global future covariates models:`RNNModel`

,`RegressionModel`

(incl.`LinearRegressionModel`

&`RandomForest)`

. Local future covariates models:`ARIMA`

,`VARIMA`

,`AutoARIMA`

.

You shouldn’t be too worried about making a mistake when employing past and future covariates, because Darts will complain if you try providing the wrong kind of covariates to the wrong model or if your covariates are not known sufficiently into the future (or into the past). In addition, it takes care of slicing the covariates and targets for you automatically, even if they are not aligned (as long as the time axes of the series are correct).

Note that `RegressionModel`

(incl. `LinearRegressionModel`

and `RandomForest)`

support both `past_covariates`

and `future_covariates`

. In the rest of the article, we’ll see how to fit some RNN-based models using either past covariates or future covariates, and then we’ll fit a `RegressionModel`

using both past and future covariates.

# A Toy Example: Forecasting a River Flow

As a toy example, let’s assume we want to forecast the flow of a river. We’ll be using synthetic time series data (created with Darts as well) to demonstrate how past and future covariates can be used. What we’ll do here is only meant to demonstrate how covariates can be used, and by no means represents a good (or realistic) way to forecast an actual river flow ;)

You can reproduce this example by installing Darts as follows:

`pip install darts`

The entire code is also available in a notebook here.

**A Simplistic River Model**

We assume that the flow of our river on day *t* depends on two factors:

- The melting rate of an upstream glacier
*t - 5*days ago. - The rainfalls during the last 5 days (from
*t - 4*to*t*).

We want to forecast the flow 10 days in advance. Furthermore, we assume that:

- The glacier’s melting rate is not known in advance because we have to measure it directly in order to know it; it is thus a
*past covariate*. - The rainfall is known 10 days in advance from weather forecasts. It is thus a
*future covariate*. It is also known in the past.

We start by generating some synthetic daily time series to create a problem instance. Darts’ global models (such as neural networks and regression models) can easily be trained on multiple time series (for instance just calling `model.fit([series1, series2, ...], past_covariates=[covariate1, covariate2, ...])`

), so we could simulate several rivers and train one model on all these data. But here we will focus on showing how to use past and future covariates using only one target series.

In the code below, `melting`

is our past glacier melting covariate series, `rainfalls`

is the future rainfall covariate series, and `flow`

is the target river flow (which we want to forecast):

## Evaluating Models

Now that we have our data, we can already think about how we would want to evaluate and compare the different models we’ll build. Below we write a small function which performs backtesting and evaluates the accuracy of a 10-days ahead predictions over the last 20% of the flow series, using RMSE:

## First Model: No Covariate

Let’s first create a `BlockRNNModel`

. These models support `past_covariates`

, but here in order to get a first benchmark, we’ll fit it on the target only and see what we get. We somewhat arbitrarily select an `input_chunk_length`

of `30`

(this corresponds to the lookback window of the model), and we set the `output_chunk_length`

to `10`

, as this is the horizon we’re interested to forecast:

## Second Model: Using Past Melting Data

Let’s now try to provide the `melting`

series as a `past_covariates`

to the model `fit()`

function. Doing this means that the model will look at the past 30 time steps of melting (in addition to the past 30 time steps of the target) when producing a forecast.

This already improved the RMSE from 0.194 to 0.172, which is not bad; looking at the past melting helps because it determines part of the current flow.

## Third Model: Using Past Melting and Past Rainfall Data

We can seamlessly extend this to use both the past melting and past rainfall data. The rainfall is known in advance, but here we specify it as a `past_covariates`

, which means that the model will only look at past rainfalls.

In the following snippet, `melting.stack(rainfalls)`

produces one multivariate`TimeSeries`

containing two dimensions: the melting and the rainfall. This is the series we use as a past covariate.

Adding past rainfalls helps too, reducing the error further from 0.172 to 0.169. The rainfalls impacts the next 5 days’ flow, and so past rainfalls provide some amount of signal to predict the next 10 days’ flow. The impact is still somewhat limited, though, because this model is only looking at *past* rainfalls and not at the actual future rainfalls happening during the 10 days for which we want to predict the flow.

## Fourth Model: Using Future Rainfalls

Let’s now try to use *future* rainfalls as a covariate. This might help us because a model using `future_covariates`

will be able to look at the next 10 days’ rainfalls (in addition to past rainfalls) in order to predict the next 10 days’ flow. To do this, we’ll use an `RNNModel`

, which is a “pure RNN” implementation that is able to use `future_covariates`

(our `RNNModel`

is similar to DeepAR).

It seems that it’s working: letting the model see the rainfalls for the next n=10 days brings back the RMSE down to 0.158. Again, this makes sense as the recent rainfalls make up a large component of the flow.

Note that we cannot use the melting as a future covariate, because it is not known in advance, and so we wouldn’t be able to provide it at prediction time (Darts would complain if you tried to call `predict()`

with a `future_covariates`

series that doesn’t extend at least 10 time points in the future further than the target).

## Fifth Model: Using Past Melting and Future Rainfalls

Finally, we will now use a `RegressionModel`

in order to be able to specify both a `past_covariates`

and a `future_covariates`

. `RegressionModel`

in Darts is a wrapper around any “scikit-learn like” regression model, and by default it will use a linear regression. It can predict future values of the target series as a function of any combination of lagged values of the target, past and future covariates.

The lags of the target and past covariates have to be strictly negative (in the past), whereas the lags of the future covariates can also be positive (in the future). For instance, a lag value of *-5* means that the value at time *t-5* is used to predict the target at time *t*; and a lag of *0* means that the future covariate value at time *t* is used to predict the target at time *t. *In the code below, we specify past covariate lags as `[-5, -4, -3, -2, -1]`

which means that the model will look at the last 5 `past_covariates`

values (we could also have specified `lags_past_covariates=5`

instead). Similarly, we specify the future covariate lags as `[-4, -3, -2, -1, 0]`

which means that the model will look at the last 4 historic values (lags `-4`

to `-1`

) and the current value (lag `0`

) of the `future_covariates`

. (we could also have specified `lags_future_covariates=(4,1)`

instead). Note that we do not specify any `lags`

here, which means that this model won’t look at past values of the target at all — it will look at covariates only.

This model drastically improves the RMSE error, down to 0.102. So once again, linear regression wins! In fact, if we kept some additive noise on the covariates but removed the additive noise on the flow, we would find that this model produces perfect forecasts. To be fair, this was expected because the target is built as a linear combination of the covariates to begin with, and we built our `RegressionModel`

specifying the exact right lags capturing the data generation process. Still, we expect these regression models to be very useful in practice, due to their speed, versatility in capturing both past and future covariates with precise lags, and the fact that, similar to neural networks, they can be trained on multiple series while requiring less tuning.

# Conclusions

Past and future covariates often play an important role in forecasting problems, but they can be hard to handle and reason about. One goal of Darts is to make this experience easier and less error prone: using covariates with Darts boils down to providing your external time series data `past_covariates`

or `future_covariates`

arguments to the `fit()`

and `predict()`

methods of the models. In our river flow example, we observed that knowing past glacier melting and future rainfalls can each improve forecasting to different extents, and building a simple linear-regression based model capturing both obtains the best results in this case.

If you have any feedback on Darts, or if you have forecasting challenges you’d like to tell us about, feel free to reach out to us. You can also checkout our website.