Published in


Time Series Cross-validation — a walk forward approach in python

When we create a machine learning model, cross-validation allows us to validate if the model is in the direction we expect it to be. A good cross-validation scheme is one that emulates the test distribution well. Because when it comes to features engineering or hyperparameter tuning, you wouldn’t want a scenario that whatever changes you are making to not be indicative of future unseen data. Typically we call them our test set.

It is very easy to succumb to the common pitfalls of creating a bad validation scheme, where you are happily making ‘improvements’ and changes based on your validation metric but in fact, your model is not going to do well on your unseen data. But remember that ultimately, we will want a model that does well in future unseen test data.

A good guide to follow is to ensure validation scheme to follow as closely as test data. We will touch on that in details later below.

Today, I will discuss a form of validation that is used in time-series. Time series data are usually trickier to handle as compared to typical cross-sectional based data (think of them like different observation). In time series, we often have to respect the order between them.

This also affects feature engineering. (but that is a topic for another day) So why do we have to treat time series differently?

No Mixture

Most general cross-validation techniques are not correct when it comes to time-series data/problems. Bootstrapping resampling (apart from time series bootstrapping), k-fold or stratified k-fold shuffles the data and does not ensure that there is the same coherence in your validation folds as compared to your actual test set. In fact, recall that your test data will be some period from the future.

A good way to understand is by doing so, the valid vs test distribution will not be the same. Not only are you attempting to learn from future data and predicting past validation, but you will also be creating validation scores that may not be accurate.

In kaggle, we often encounter what is known as a leaderboard shakeup. This happens because of reasons like

1. overfitting on small % of the test set (Leaderboard scores)

2. test set distribution is different from validation which results in bad model selection.

(This is definitely a topic on another level) But my point is to point out the importance of doing validation, especially for time-series data. Otherwise, you would not be able to distinguish the rotten apples!

Most time-series models/problems, in reality, also grows in size. For example, our training data grows every time period. We might be predicting short term (next month) or extrapolating some horizon h depending if we are using classical time series models, machine learning or neural networks. In the subsequent period, we will gain more data and the process repeats. Also, most of the time the model needs to be fitted / train with the new data before being able to forecast horizon h again.

This makes sense to use what is known as the walk-forward cross-validation / expanding window cross-validation.

Walking- forward

With that in mind, The usual train-test split will definitely work better than the k-fold approaches. By ordering your data from past to present and splitting the later half as valid data. However, the train-test split might not be a good gauge since we might want to avoid overfitting to your particular split.

In other words, where you choose to split plays a part in your validation result. To avoid luck as a factor and have a more robust result that we can trust, we can expand on this concept by doing what is known as walk-forward validation.

Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia.

Professor Rob Hyndman illustrates this concept very well with the diagram above. In fact, we are doing ‘multiple’ split across the different time period, with the training data expanding each fold. In many cases, especially the work I personally do for product demand forecasting, this provides a more robust gauge of your model performance.

We then average the performance across the different folds based on the metric you decide (I.e MAPE, WMAPE..etc)

Of course walk forward validation is not restricted to just expanding by one period. Here is where you would want to match your test data as closely as possible. Depending on your data’s frequency, model type (arima, ets, GBM…etc) or even your objective, you might want to define a different initial length, period of data gain as well as horizon to validate on.

Hyndman, R.J., & Athanasopoulos, G. (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia.

My Python solution — walk forward cv

If you are using Professor Hyndman’s forecast package in R, then you can simply call the tsCv function which wraps around. You will need to define a function that takes in your data x as well as horizon h, which you will need to fit into tsCv().

#Fit an AR(2) model to each rolling origin subset far2 <- function(x, h){forecast(Arima(x, order=c(2,0,0)), h=h)} e <- tsCV(lynx, far2, h=1)  #Fit the same model with a rolling window of length 30 e <- tsCV(lynx, far2, h=1, window=30) 

But python users are out of luck. For sklearn, there is a time series split. But it does not allow customization of an initial period for training (which is important, for instance, if you would like to train with minimum 2–3 years of data to capture seasonality), define a period of data that will be added at each step forward as well as horizon h. Sklearn version only allows us to define n_splits, where the splits between train and validation are defined as follow:

Based on sklearn —

As such, I have written my own version of the expanding window that follows the grammar of sklearn below:

It works just like other sklearn CV splits which returns your data index.


Time series cross-validation is not limited to walk-forward cross-validation. A rolling window approach can also be used and Professor Hyndman also discussed Time-series bootstrapping in his textbook. Perhaps I will touch on it in another post. But for now, hope that the expanding window approach will provide an easier method to back-test for time series data!



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store