Forecasting with ETNA: Fast and Furious
Forecasting hundreds of time series in a trace
Hi! My name is Vlad, and I am working at Tinkoff AI Center in the ETNA team. I want to tell you about our handy framework for forecasting multiple time series — ETNA library.
Motivation
In everyday life, we always meet with forecasting problems: for example, we want to know the number of calls at service lines or how much cash will be withdrawn from the ATM during next week, month and so on. Data scientists and analysts, who face forecasting problems, should often use numerous Python and R packages for EDA, preprocessing, feature selection, fitting models.
It is very inconvenient and time-consuming! To make things easier, we have started to develop our framework. It is like a Lego constructor without any age restrictions: using simple “bricks” (EDA methods, transforms, models) you can build either a simple “bike” to predict one time series or construct a powerful and effective “racing car” for forecasting hundreds of time series! Let’s “order” this constructor and build the first pipeline!
Getting Started
Installation
It’s effortless to install ETNA package through the command line
pip install --upgrade pip
pip install etna
Check the installation and version
pip show etna
Name: etna
Version: 1.4.0
Summary: ETNA is the first python open source framework of Tinkoff.ru AI Center. It is designed to make working with time series simple, productive, and fun.
Home-page: https://github.com/tinkoff-ai/etna
Author: Andrey Alekseev
Author-email: an.alekseev@tinkoff.ru
License: Apache 2.0
Load Data
As an example, we download the dataset on COVID-19
This dataset has enormous information over almost the last two years. We are going to predict new cases per million for some countries
ETNA is very strict about data format: your dataset must contain timestamp and target columns as usual. Also you must have a segment column to distinguish between time series. Moreover, we use the TSDataset object to store data.
Exploratory Data Analysis
Ok, we have initialized TSDataset object — let’s look at our data and do some analysis. For plotting all data, you need to type one command:
We want to know our data better and build some analytical plots! In ETNA package, you can visualize many distributions:
- Cross-correlations plots
- PACF
- Correlation heatmap
- Z-statistics boxplot
- Outliers search
For example, we are going to plot PACF and Outliers search. To see more methods, you can look at ETNA Documentation.
Forecasting
The ETNA library has a variety of time series forecasting models: from modern CatBoost, Facebook Prophet, Amazon DeepAR up to classical SARIMAX, SeasonalMovingAverage. You can find a complete list of models in the Documentation.
As we can see from PACF, all segments have a weekly seasonality so let’s build backtest and forecast with a simple seasonal model with weekly seasonality. You can find how to build a more complex model in the examples directory.
In the beginning, we must initialize Pipeline object:
Further, we can build backtest with our model
Look at metrics
And visualize backtest results
As we can see, our model shows pretty well results, so let’s predict the future!
Conclusion
Hurray! You have just learned how to build models, forecast time series, estimate the performance with ETNA Library. The following articles will discuss preprocessing methods, feature generation and outlier detection methods implemented in the ETNA package.
Contacts
If you have any problems installing or using ETNA Library, you can join our Telegram channel and ask questions or write an issue on GitHub:)
Useful links: ETNA GitHub, Documentation, Examples, Kaggle Example Notebook