Darts: Time Series Made Easy in Python

Julien Herzen
Jun 29, 2020 · 6 min read
Image for post
Image for post

Time series simply represent data points over time. They are thus everywhere in nature and in business: temperatures, heartbeats, births, population dynamics, internet traffic, stocks, inventories, sales, orders, factory production — you name it. In countless cases, efficient processing and forecasting of time series has the potential to provide decisive advantages. It can help businesses adapt their strategies ahead of time (e.g. if production can be planned in advance), or improve their operations (e.g. by detecting anomalies in complex systems). Although there exist many models and tools for time series, they are still often nontrivial to work with, because they each have their own intricacies and cannot always be used in the same way. At Unit8, we often work with time series and thus we started developing our own tool to make our lives simpler. We also decided to contribute to the community by open-sourcing it. In this article, we introduce Darts, our attempt at simplifying time series processing and forecasting in Python.

Motivation

We are big fans of the scikit-learn approach: a single open-source library with consistent API that contains a great set of tools for end-to-end machine learning. Darts attempts to be a scikit-learn for time series, and its primary goal is to simplify the whole time series machine learning experience.

Image for post
Image for post
Darts is an attempt to smooth the end-to-end time series machine learning experience in Python

Show Me!

pip install 'u8darts[all]'

The basic data type in Darts is TimeSeries, which represents a multivariate time series. It is mostly a wrapper around Pandas DataFrame, with some additional guarantees to ensure that it represents a well-formed time series, with a proper time index. It can easily be built, for example from a DataFrame:

import pandas as pd
from darts import TimeSeries
df = pd.read_csv('AirPassengers.csv')series = TimeSeries.from_dataframe(df, 'Month', '#Passengers')

In the above snippet, we first read a DataFrame containing the air passengers dataset. We then build a (univariate) TimeSeries, specifying the time and value columns (Month and #Passengers, respectively).

Let’s now split our series in a training and validation TimeSeries, and train an exponential smoothing model on the training series:

from darts.models import ExponentialSmoothingtrain, val = series.split_before(pd.Timestamp('19580101'))model = ExponentialSmoothing()
model.fit(train)
prediction = model.predict(len(val))

That’s it, we now have a prediction over our validation series. We can plot it, along with the actual series:

import matplotlib.pyplot as pltseries.plot(label='actual')
prediction.plot(label='forecast', lw=3)
plt.legend()
Image for post
Image for post
Forecasting the number of air passengers over 3 years (36 monthly values), using a simple exponential smoothing model.

A Few More Details

from darts.models import AutoARIMAmodel_aarima = AutoARIMA()
model_aarima.fit(train)
prediction_aarima = model_aarima.predict(len(val))

Basically, Darts is based on the following simple principles:

  • There are two kinds of models. Forecasting models predict the future values of a time series given the past values, and regression models predict values of a target time series given a set of feature time series. The exponential smoothing and auto-ARIMA model we built above are examples of forecasting models.
  • Unified fit() and predict() interface across all forecasting models, from ARIMA to neural networks.
  • Models consume and produce TimeSeries, which means for instance that it is easy to have a regression model consume the output of a forecasting model.
  • TimeSeries can be either univariate (1-dimensional) or multivariate (multi-dimensional). Certain models such as those based on neural nets operate on multivariate series, while others are restricted to univariate series.
  • Immutability: the TimeSeries class is designed to be immutable.

Darts already contains working implementations of the following forecasting models:

The library also contains functionalities to backtest forecasting and regression models, perform grid search on hyper-parameters, evaluate residuals, and even perform automatic model selection.

Another Example — Backtesting

from darts.models import Prophetmodels = [ExponentialSmoothing(), Prophet()]backtests = [model.historical_forecasts(series,
start=.5,
forecast_horizon=3)
for model in models]

The function historical_forecasts() is available on all models. It takes a time series, a starting point (here, we are starting at half of the series) and a forecast horizon. It returns the TimeSeries containing the historical forecasts would have been obtained when using the model to forecast the series with the specified forecast horizon (here 3 months), starting at the specified timestamp (using an expanding window strategy).

The return type is a TimeSeries, and so we can quickly compute error metrics — for instance here the mean absolute percentage error:

from darts.metrics import mapeseries.plot(label='data')
for i, m in enumerate(models):
err = mape(backtests[i], series)
backtests[i].plot(lw=3, label='{}, MAPE={:.2f}%'.format(m, err))
plt.title('Backtests with 3-months forecast horizon')
plt.legend()
Image for post
Image for post
Backtesting forecasting models — here we simulate making forecasts with a 3 months horizon, every month starting in January 1955 (so the first forecast value is for April 1955).

In addition, because the return type of historical_forecasts() is a TimeSeries, we can also simply consume the outputs features series in regression models, which can serve to ensemble (stack) the forecasts made by several models, and potentially also include external time series data. All the models also have a backtest() function that works similarly, but directly returns the distributions of errors (for a desired error function) instead.

There is a lot more that we did not cover here. We provide a series of example notebooks covering more material. For instance, you can look at the intro notebook, or see how to easily train RNNs or TCNs neural networks using the fit() and predict() pattern. In addition, we also recommend consulting the Darts documentation.

Training Models on Multiple Time Series

What Next?

  • A treatment of probabilistic time series
  • Some anomaly-detection functionalities
  • Some time series embedding and clustering capabilities

We are welcoming contributions and issues on github.

Finally, Darts is one of the tools we are using internally during our day-to-day AI/ML work for several companies. If you think your company could benefit from time series solutions or have other data-centric issues, don’t hesitate to contact us.

Unit8 - Big Data & AI

Solving your most impactful problems via BigData & AI …

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store