Forecasting with ETNA: Fast and Furious

Forecasting hundreds of time series in a trace

Vlad Denisov
IT’s Tinkoff
4 min readDec 7, 2021

--

Hi! My name is Vlad, and I am working at Tinkoff AI Center in the ETNA team. I want to tell you about our handy framework for forecasting multiple time series — ETNA library.

Motivation

In everyday life, we always meet with forecasting problems: for example, we want to know the number of calls at service lines or how much cash will be withdrawn from the ATM during next week, month and so on. Data scientists and analysts, who face forecasting problems, should often use numerous Python and R packages for EDA, preprocessing, feature selection, fitting models.

It is very inconvenient and time-consuming! To make things easier, we have started to develop our framework. It is like a Lego constructor without any age restrictions: using simple “bricks” (EDA methods, transforms, models) you can build either a simple “bike” to predict one time series or construct a powerful and effective “racing car” for forecasting hundreds of time series! Let’s “order” this constructor and build the first pipeline!

Getting Started

Installation

It’s effortless to install ETNA package through the command line

Check the installation and version

Load Data

As an example, we download the dataset on COVID-19

This dataset has enormous information over almost the last two years. We are going to predict new cases per million for some countries

ETNA is very strict about data format: your dataset must contain timestamp and target columns as usual. Also you must have a segment column to distinguish between time series. Moreover, we use the TSDataset object to store data.

Exploratory Data Analysis

Ok, we have initialized TSDataset object — let’s look at our data and do some analysis. For plotting all data, you need to type one command:

We want to know our data better and build some analytical plots! In ETNA package, you can visualize many distributions:

  • Cross-correlations plots
  • PACF
  • Correlation heatmap
  • Z-statistics boxplot
  • Outliers search

For example, we are going to plot PACF and Outliers search. To see more methods, you can look at ETNA Documentation.

Forecasting

The ETNA library has a variety of time series forecasting models: from modern CatBoost, Facebook Prophet, Amazon DeepAR up to classical SARIMAX, SeasonalMovingAverage. You can find a complete list of models in the Documentation.

As we can see from PACF, all segments have a weekly seasonality so let’s build backtest and forecast with a simple seasonal model with weekly seasonality. You can find how to build a more complex model in the examples directory.

In the beginning, we must initialize Pipeline object:

Further, we can build backtest with our model

Look at metrics

And visualize backtest results

As we can see, our model shows pretty well results, so let’s predict the future!

Conclusion

Hurray! You have just learned how to build models, forecast time series, estimate the performance with ETNA Library. The following articles will discuss preprocessing methods, feature generation and outlier detection methods implemented in the ETNA package.

Contacts

If you have any problems installing or using ETNA Library, you can join our Telegram channel and ask questions or write an issue on GitHub:)

Useful links: ETNA GitHub, Documentation, Examples, Kaggle Example Notebook

--

--

Vlad Denisov
IT’s Tinkoff

Buisness Analyst/Working at Tinkoff/Graduated from MIPT/Master’s Degree in Math & Physics/Amateur Data Scientist