Stories by azul garza ramirez on Medium

Distributed Forecast of 1M Time Series in Under 15 Minutes with Spark, Nixtla, and Fugue

azul garza ramirez — Fri, 16 Sep 2022 04:54:10 GMT

Scalable Time Series Modeling with open-source projects StatsForecast, Fugue, and Spark

By Kevin Kho, Han Wang, Max Mergenthaler and Federico Garza Ramírez.

TL:DR We will show how you can leverage the distributed power of Spark and the highly efficient code from StatsForecast to fit millions of models in a couple of minutes.

Time-series modeling, analysis, and prediction of trends and seasonalities for data collected over time is a rapidly growing category of software applications.

Businesses, from electricity and economics to healthcare analytics, collect time-series data daily to predict patterns and build better data-driven product experiences. For example, temperature and humidity prediction is used in manufacturing to prevent defects, streaming metrics predictions help identify music’s popular artists, and sales forecasting for thousands of SKUs across different locations in the supply chain is used to optimize inventory costs. As data generation increases, the forecasting necessities have evolved from modeling a few time series to predicting millions.

Motivation

Nixtla is an open-source project focused on state-of-the-art time series forecasting. They have a couple of libraries such as StatsForecast for statistical models, NeuralForecast for deep learning, and HierarchicalForecast for forecast aggregations across different levels of hierarchies. These are production-ready time series libraries focused on different modeling techniques.

This article looks at StatsForecast, a lightning-fast forecasting library with statistical and econometrics models. The AutoARIMA model of Nixtla is 20x faster than pmdarima, and the ETS (error, trend, seasonal) models performed 4x faster than statsmodels and are more robust. The benchmarks and code to reproduce can be found here. A huge part of the performance increase is due to using a JIT compiler called numba to achieve high speeds.

The faster iteration time means that data scientists can run more experiments and converge to more accurate models faster. It also means that running benchmarks at scale becomes easier.

In this article, we are interested in the scalability of the StatsForecast library in fitting models over Spark or Dask using the Fugue library. This combination will allow us to train a huge number of models distributedly over a temporary cluster quickly.

Experiment Setup

When dealing with large time series data, users normally have to deal with thousands of logically independent time series (think of telemetry of different users or different product sales). In this case, we can train one big model over all of the series, or we can create one model for each series. Both are valid approaches since the bigger model will pick up trends across the population, while training thousands of models may fit individual series data better.

Note: to pick up both the micro and macro trends of the time series population in one model, check the Nixtla HierarchicalForecast library, but this is also more computationally expensive and trickier to scale.

This article will deal with the scenario where we train a couple of models (AutoARIMA or ETS) per univariate time series. For this setup, we group the full data by time series, and then train each model for each group. The image below illustrates this. The distributed DataFrame can either be a Spark or Dask DataFrame.

AutoARIMA per partition — Image by Author

Nixtla previously released benchmarks with Anyscale on distributing this model training on Ray. The setup and results can be found in this blog. The results are also shown below. It took 2000 cpus to run one million AutoARIMA models in 35 minutes. We’ll compare this against running on Spark.

StatsForecast on Ray results — Image by author

Statsforecast Code

First, we’ll look at the StatsForecast code used to run the AutoARIMA distributedly on Ray. This is a simplified version to run the scenario with a one million time series. It is also updated for the recent StatsForecast v1.0.0 release, so it may look a bit different from the code in the previous benchmarks.

https://medium.com/media/cb97e485c9e89855d7b213d584e2872b/href

The interface of StatsForecast is very minimal. It is already designed to perform the AutoARIMA on each group of data. Just supplying the ray_address will make this code snippet run distributedly. Without it, n_jobswill indicate the number of parallel processes for forecasting. model.forecast() will do the fit and predict in one step, and the input to this method in the time horizon to forecast.

Using Fugue to run on Spark and Dask

Fugue is an abstraction layer that ports Python, Pandas, and SQL code to Spark and Dask. The most minimal interface is the transform() function. This function takes in a function and DataFrame, and brings it to Spark or Dask. We can use the transform() function to bring StatsForecast execution to Spark.

There are two parts to the code below. First, we have the forecast logic defined in the forecast_series function. Some parameters are hardcoded for simplicity. The most important one is that n_jobs=1 . This is because Spark or Dask will already serve as the parallelization layer, and having two stages of parallelism can cause resource deadlocks.

https://medium.com/media/e7ce8fdf2d33500490ab3657df1af17d/href

Second, the transform() function is used to apply the forecast_series() function on Spark. The first two arguments are the DataFrame and function to be applied. Output schema is a requirement for Spark, so we need to pass it in, and the partition argument will take care of splitting the time series modelling by unique_id.

This code already works and returns a Spark DataFrame output.

Nixtla’s FugueBackend

The transform()above is a general look at what Fugue can do. In practice, the Fugue and Nixtla teams collaborated to add a more native FugueBackendto the StatsForecast library. Along with it is a utility forecast() function to simplify the forecasting interface. Below is an end-to-end example of running StatsForecast on one million time series.

https://medium.com/media/1ff768f9bb0d38bec1283dd881a87ed9/href

We just need to create the FugueBackend, which takes in a SparkSession and passes it to forecast() . This function can take either a DataFrame or file path to the data. If a file path is provided, it will be loaded with the parallel backend. In this example above, we replaced the file each time we ran the experiment to generate benchmarks.

It’s also important to note that we can test locally before running the forecast()on full data. All we have to do is not supply anything for the parallel argument; everything will run on Pandas sequentially.

Benchmark Results

The benchmark results can be seen below. As of the time of this writing, Dask and Ray made recent releases, so only the Spark metrics are up to date. We will make a follow-up article after running these experiments with the updates.

Spark and Dask benchmarks for StatsForecast at scale

Note: The attempt was to use 2000 cpus but we were limited by available compute instances on AWS.

The important part here is that AutoARIMA trained one million time series models in less than 15 minutes. The cluster configuration is attached in the appendix. With very few lines of code, we were able to orchestrate the training of these time series models distributedly.

Conclusion

Training thousands of time series models distributedly normally takes a lot of coding with Spark and Dask, but we were able to run these experiments with very few lines of code. Nixtla’s StatsForecast offers the ability to quickly utilize all of the compute resources available to find the best model for each time series. All users need to do is supply a relevant parallel backend (Ray or Fugue) to run on a cluster.

On the scale of one million timeseries, our total training time took 12 minutes for AutoARIMA. This is the equivalent of close to 400 cpu-hours that we ran immediately, allowing data scientists to quickly iterate at scale without having to write the explicit code for parallelization. Because we used an ephemeral cluster, the cost is effectively the same as running this sequentially on an EC2 instance (parallelized over all cores).

Resources

To chat with us:

Appendix

For anyone. interested in the cluster configuration, it can be seen below. This will spin up a Databricks cluster. The important thing is the node_type_id that has the machines used.

{
    "num_workers": 20,
    "cluster_name": "fugue-nixtla-2",
    "spark_version": "10.4.x-scala2.12",
    "spark_conf": {
        "spark.speculation": "true",
        "spark.sql.shuffle.partitions": "8000",
        "spark.sql.adaptive.enabled": "false",
        "spark.task.cpus": "1"
    },
    "aws_attributes": {
        "first_on_demand": 1,
        "availability": "SPOT_WITH_FALLBACK",
        "zone_id": "us-west-2c",
        "spot_bid_price_percent": 100,
        "ebs_volume_type": "GENERAL_PURPOSE_SSD",
        "ebs_volume_count": 1,
        "ebs_volume_size": 32
    },
    "node_type_id": "m5.24xlarge",
    "driver_node_type_id": "m5.2xlarge",
    "ssh_public_keys": [],
    "custom_tags": {},
    "spark_env_vars": {
        "MKL_NUM_THREADS": "1",
        "OPENBLAS_NUM_THREADS": "1",
        "VECLIB_MAXIMUM_THREADS": "1",
        "OMP_NUM_THREADS": "1",
        "NUMEXPR_NUM_THREADS": "1"
    },
    "autotermination_minutes": 20,
    "enable_elastic_disk": false,
    "cluster_source": "UI",
    "init_scripts": [],
    "runtime_engine": "STANDARD",
    "cluster_id": "0728-004950-oefym0ss"
}

Distributed Forecast of 1M Time Series in Under 15 Minutes with Spark, Nixtla, and Fugue was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Forecasting with Synthetic Data at Scale (Nixtla & YData)

azul garza ramirez — Wed, 05 Jan 2022 18:09:22 GMT

Make synthetic time series data and then forecast it with Deep Learning models

By Nixtla and YData. Federico Garza Ramírez and Max Mergenthaler.

Introduction

In this post, we explain how to use nixtlats and ydata-synthetic, open-source and free python libraries that allow you to generate synthetic data to train state-of-the-art deep learning models without any significant loss of data quality. We develop a deep learning forecasting pipeline without direct access to the original data and show that synthetic data has a minimal impact on the performance of the models.

Motivation

In the last decade, neural network-based forecasting methods have become ubiquitous in large-scale forecasting applications, transcending industry boundaries into academia, as it has redefined the state-of-the-art in many practical tasks like demand planning, electricity load forecasting, reverse logistics, weather forecasting, as well as forecasting competitions like the M4 and M5.

However, one of the problems for those interested in creating forecasts is model development or software testing without using original data; this may be because the actual data takes time to collect, there are restrictions on its use, or the data simply does not exist. In many applications, the user does not want the model to have access to the actual data, in particular, if the model training is done in the cloud or outside one’s infrastructure. The above dramatically limit the practice, preventing the scaling of models for large datasets using available clouds.

This post shows how to solve this problem using nixtlats and ydata-synthetic. First, the user can create synthetic data using ydata-synthetic; synthetic data is artificially created and keeps the original data properties, ensuring its business value while being compliant. Subsequently the user can train state-of-the-art neural forecasting algorithms using nixtlats without accessing the original data. Once the model is trained, the model can be sent to the owner of the original data and perform inference in the security of their infrastructure. The following diagram describes the process.

Image by the authors

We evaluate and show the performance of the synthetic model’s predictions remains constant compared with the original model’s predictions.

Libraries

The libraries nixtlats and ydata-synthetic are available in PyPI, so you can install them using pip install nixtlats and pip install ydata-synthetic.

https://medium.com/media/ef81f7b5426aa19c010d63f313e412b4/href

Data

To evaluate the pipeline, we consider the yearly M4 competition dataset. The dataset was originally released publicly and it was released with a completely open-access license. The M4 major forecasting competition introduced a novel multivariate time series model called Exponential Smoothing Recurrent Neural Network (ESRNN), which won by a large margin over baselines and complex time series ensembles.

We will use nixtlats library to easily access the data.

https://medium.com/media/352dd4d7d4f09acf36b3c6e2844196f9/href

In this example, we use 1,000 Yearly time series.

https://medium.com/media/a28b6559e60e5dff68a6f6facff2e870/href

The M4.load method returns train and test sets, so we need to split them. The library also provides a wide variety of datasets, see the documentation.

https://medium.com/media/63707be9244ef7fb7ab96d49d5b43361/href

nixtlats requires a dummy test set to make forecasts, so we combine the training data with the testing data with zero values.

https://medium.com/media/8f3a5fb1f9f3b30270200d145e402796/href https://medium.com/media/106b236a33feb19173a2c6b0e550e8f3/href

Pipeline

Creating synthetic data using ydata-synthetic

In this section we make synthetic the training data defined by Y_df_train using the TimeGAN model from ydata-synthetic. You can learn more about the TimeGAN model seeing the post Synthetic Time-Series Data: A GAN approach.

https://medium.com/media/7cf941eb099cb423f00e75199ba6e75a/href https://medium.com/media/3ffb810010f6b600ce45c12cdf8403b1/href

The following lines train the TimeGAN model,

https://medium.com/media/b8efd99e7ed19b4bd4a0ee3d436849ec/href https://medium.com/media/8ad3200bd93cf27265ff09339fac7fcf/href https://medium.com/media/90f35684fa0b643eaf7e447a38c15fac/href

Thus, the object synth_data contains the synthetic training data. To use nixtlats we need to transform synth_data to a pandas dataframe. This can be easy done using the following lines.

https://medium.com/media/3b573908face8ff282f53f25e64af87f/href https://medium.com/media/ae21993ec47000ac64341b95f284c8ca/href

Training Deep Learning model using nixtlats

In this section, we use the previous synthetic data to train the ESRNN model, the winner of the M4 competition. This model is hybrid; by one hand, it fits each time series locally through an Exponential Smoothing model and then trains the levels using a Recurrent Neural Network. You can learn more about this model by seeing the post Forecasting in Python with the ESRNN model.

The pipeline for model training follows the PyTorch common practices. In the first instance a Dataset must be instantiated. The TimeSeriesDataset class allows to return the complete series in each iteration, this is useful for recurrent models such as ESRNN. To be instantiated, the class receives the target series Y_df as a pandas dataframe with columns unique_id, ds and y. Additionally, temporary exogenous variables X_df and static variables S_df can be included. In this case we only use static variables as in the original model.

https://medium.com/media/3d8f13fe58d32a00db1612ac8c178ca4/href https://medium.com/media/972528504b58c147c8d81e20ad85d745/href

The next we need to do is define the ESRNN model included in nixtlats as follows,

https://medium.com/media/a9f9fd8fd7dae5a6636ca7c5d4d944d7/href

And then we can train it as follows,

https://medium.com/media/2fa33fac0d9db03ecc9af1efc037404f/href https://medium.com/media/72c02b62e76da11221b95897eb75f84e/href https://medium.com/media/6c8c79248b959ba7f2f3bc0f278be074/href https://medium.com/media/4331aa79aa71d10a52aba3957730b844/href

Model trained with real data

To compare both solutions offer similar results, in this section we train the model with the original data.

https://medium.com/media/c0daa0616da54b870fcebc86c397d644/href https://medium.com/media/b9c1d8c0dc1409fb957fc70660c2adb6/href https://medium.com/media/2ad4b7d1c1d6396dce9392fb782681f6/href

And then we can train it as follows,

https://medium.com/media/9eeaa3f444254cbe35ad196f071b512a/href https://medium.com/media/8ce0e8bb9c94e831ae6fddc3d3385a33/href https://medium.com/media/3a31e8b9219d6c0bb2d953e0affe219d/href

Comparing forecasts

Finally, we use the original data to make forecasts for both models, model_synth trained with synthetic data and model, trained with the original data. First, we define the test dataset and loader.

https://medium.com/media/d06e7ea8a6b3feae696a77de53cb8ded/href https://medium.com/media/cfc5de2e335e44334edef2da0cf1e8c8/href

The following lines obtains forecasts with the synthetic model,

https://medium.com/media/428e78d690ef42af0b33ba13f4ef201c/href https://medium.com/media/fc7290f765a16947b159655e1e454cdb/href https://medium.com/media/07b99520175c6f8b499202f759629a67/href

Likewise, the following lines obtaines forecasts with the model trained with real data,

https://medium.com/media/7a56702321d1d297ba9378baca8c478e/href

Now we compare the performance of both models against the real value using the Mean Average Percentage Error (MAPE) and its symmetric version (SMAPE). nixtlats provides functions to easily do that.

https://medium.com/media/28bee88ce30ca4277fc9cbd749ec8608/href https://medium.com/media/40d21ea8f90a779d58986d68b5ba75f0/href https://medium.com/media/aac40bc74a59c8f30a1ce152e765fc6a/href

As can we see, even the model trained with synthetic data generated with ydata-synthetic produces better forecasts considering the MAPE loss.

Conclusion

Synthetic data have a wide range of applications. In this post we showed a full pipeline to create synthetic data and using it to train state-of-the-art Deep Learning models. As we saw, performance is not harmed, and even for some metrics, it is even better.

Forecasting with Synthetic Data at Scale (Nixtla & YData) was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Prophet vs Linear Regression on Real Estate: The Zillow Case

azul garza ramirez — Wed, 15 Dec 2021 20:57:39 GMT

By Nixtla Team. fede garza ramírez , Max Mergenthaler

TL; DR Recently there has been controversy in the data science community about the Zillow case. There has been speculation that the Zillow team may have used Prophet to generate forecasts of their time series. Although we do not know if the above is true, we contribute to the discussion by showing that creating good benchmarks is fundamental in forecasting tasks. Furthermore, we show that Prophet does not turn out to be a good solution on Zillow Home Value Index data. Better alternatives are simpler and faster models like auto.arima or statsforecast, and to improve them mlforecast is an excellent option because it makes forecasting with machine learning fast and easy and it allows practitioners to focus on the model and features instead of implementation details.

Introduction

Recently, Zillow announced that it would close its home-buying business because its models were not being able to correctly anticipate price changes. The Zillow CEO Rich Barton said, “We’ve determined the unpredictability in forecasting home prices far exceeds what we anticipated”. Since this news, several opinions have been published about the alleged technology used by them for forecasting. In particular, opinions criticize the fact that they requested Prophet in their job offers.

Forecasting time series is a complicated task, and there is no single model that fits all business needs and data characteristics. Best practices always suggest starting with a simple model as a benchmark; such a model will allow, on the one hand, to build models with better performance and, on the other hand, to measure the value-added of such models (data scientists should obtain a lower loss of their more complex models compared to the benchmark’s loss).

In this blog post, we have set ourselves the goal of empirically determining whether Prophet is a good choice (or at least a good benchmark) for modeling the data used in the context of Zillow. As we will see, auto.arima and even the naive model turn out to be better baseline strategies than Prophet for the particular dataset we use. We reveal that Prophet does not perform well compared to other models, which is consistent with the evidence found by other practitioners (for example here and here). Also, we show how using mlforecast (and LinearRegression from sklearn as training model) can beat auto.arima and Prophet in no more than 3 seconds.

Dataset

The dataset we use to evaluate Prophet is the Zillow Home Value Index (ZHVI), which can be downloaded directly from the Zillow research website. According to the page, the ZHVI is "a smoothed, seasonally adjusted measure of typical home value and market changes for a given region and housing type. It reflects the typical value of homes in the 35th to 65th percentile range" and "represents the "typical" home value for a region".

The dataset reflects price changes, so we decided to experiment with it because a stakeholder can potentially use it to make decisions. The dataset consists of 909 Monthly series for different aggregations of regions and states. We downloaded it on November 4, 2021 and anybody interested can find a copy of it here.

Experiments

To test the effectiveness of Prophet in forecasting the ZHVI, we use the last 4 observations as the test set and the remaining observations as the training set. We performed a hyperparameter optimization over each time series using the last 4 observations of the training set as validation for Prophet. In addition to Prophet, we ran auto.arima of R, some models of statsforecast (random walk with drift, naive, simple exponential smoothing, window average, seasonal naive, and historic average) and mlforecast.

mlforecast is a framework that helps practitioners forecast time series using machine learning models. They need to give it a model (in this case, we use LinearRegression from sklearn), define which features to use and let mlforecast do the rest.

Reproducing results

You can reproduce the results using this repo. Just follow the next steps. The whole process is automized using Docker, conda, and Make.

make init. This instruction will create a docker container based on environment.yml which contains R and python needed libraries.
make run_module module="python -m src.prepare_data". The module splits data into train and test sets. You can find the generated data in data/prepared-data-train.csv and data/prepared-data-test.csv respectively.
make run_module module="python -m src.forecast_prophet". Fits Prophet model (forecasts in data/prophet-forecasts.csv).
make run_module module="python -m src.forecast_statsforecast". Fits statsforecast models (forecasts in data/statsforecast-forecasts.csv).
make run_module module="Rscript src/forecast_arima.R". Fits auto.arima model (forecasts in data/arima-forecasts.csv).
make run_module module="python -m src.forecast_mlforecast". Fits mlforecast model using LinearRegression (forecasts in data/mlforecast-forecasts.csv).

Results

Performance

The following table summarizes the results in terms of performance.

Image by Author

As can we see, the best model is mlforecast.linear_regression for mape, rmse, smape, and mae metrics. Surprisingly, a very simple model such as naive (takes the last value as forecasts) turns out to be better in this experiment than Prophet.

Computational cost

The following table summarizes the results in terms of computational cost.

Image by Author

To run our experiments we used a c5d.24xlarge AWS instance (96 vCPU, 192 RAM). It costs 4.608 USD each hour. As can we see, mlforecast takes no more than 3 seconds and beats Prophet and auto.arima in performance.

Conclusion

This post showed in the context of the Zillow controversy that doing benchmarks is fundamental to addressing any time series forecasting problem. Those benchmarks must be computationally efficient to iterate fast and build more complex models on top of them. The libraries statsforecast and mlforecast are excellent tools for the task. We also showed better options than Prophet to run benchmarks, which is consistent with previous findings by the data science community.

Build benchmarks. Always.

Prophet vs Linear Regression on Real Estate: The Zillow Case was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.

Time Series Forecasting with Statistical Models

azul garza ramirez — Mon, 06 Dec 2021 19:49:09 GMT

statsforecast makes forecasting with statistical models fast and easy

By Nixtla Team. fede garza ramírez, Max Mergenthaler

TL;DR

In this post we introduce statsforecast, an open-source framework that makes the implementation of statistical models in forecasting tasks fast and easy. statsforecast is able to handle thousands of time series and is efficient both time and memory wise. With this library you can easily create benchmarks on which to build more complex models; it can also allows you to run your own models in a parallel fashion. In this post we also offer a guide on how to use “Forecast Value Added” for benchmarking different models and assessing competing models.

Introduction

In this post, we will talk about using statistical models in forecasting tasks. In particular, we introduce statsforecast. This Python library allows fitting statistical models in a simple and computationally efficient way for hundreds of thousands of time series so that you can benchmark your own models quickly. Throughout this post, we will show how to use the library to calculate the Forecast Value Added of some models with respect to a benchmark model. This methodology allows us to select the best model among a variety.

Motivation

Deep learning and Machine Learning models have demonstrated state-of-the-art performance in time series forecasting tasks. However, it is helpful to have a battery of simpler models to benchmark and validate the value that those models add.

In business problems, metrics such as Forecast Value Added (FVA) are usually used to compare the value-added of more complex models against more straightforward techniques to implement and explain to decision-makers. FVA is calculated by subtracting the loss of a benchmark model from the loss of a more complex one. In the following example, three models were fitted: Naive, Statistical, and Override. The first column shows the Mean Average Percentage Error (MAPE) of these three models. The FVA vs. Naive column displays in the second row the difference between the Naive's MAPE and the Statistical's MAPE, which is positive; that means that the Statistical adds value to the process. Likewise, the third row shows the difference between the Naive's MAPE and the Override's MAPE; the result is negative, so the model Override doesn't add any value.

Image from SaaS

A wide range of statistical base models is included in statsforecast that can be used for decision making or as benchmarks for implementing more complete models. Also included are models for specific tasks, such as forecasting sparse (or intermittent) time-series, i.e., time series with a high percentage of zero values, such as sales. These models exist in implementations for the R programming language but not for Python.

statsforecast

To make benchmarking easier, we created statsforecast, which is a framework to help you forecast time series using statistical models. You just need to give it a model you want to use and let statsforecast do the rest.

Included models

ADIDA: Temporal aggregation is used for reducing the presence of zero observations, thus mitigating the undesirable effect of the variance observed in the intervals. ADIDA uses equally sized time buckets to perform non-overlapping temporal aggregation and predict the demand over a pre-specified lead-time. The time bucket is set equal to the mean inter-demand interval. SES is used to obtain the forecasts.
Croston Classic: The method proposed by Croston to forecast series that display intermittent demand. The method decomposes the original series into the non-zero demand size and the inter-demand intervals and models them using Simple Exponential Smoothing with a predefined parameter.
Croston SBA: SBA stands for Syntetos-Boylan Approximation. A variant of Croston’s method that utilizes a debiasing factor.
Croston Optimized: Like Croston, but this model optimizes the Simple Exponential Smoothing for both the non-zero demand size and the inter-demand intervals.
Historic average: Simple average of the time series.
iMAPA: iMAPA stands for Intermittent Multiple Aggregation Prediction Algorithm. Another way for implementing temporal aggregation in demand forecasting. However, in contrast to ADIDA that considers a single aggregation level, iMAPA considers multiple ones, aiming at capturing different dynamics of the data. Thus, iMAPA proceeds by averaging the derived point forecasts, generated using SES.
Naive: Uses the last value of the time series as forecast. The simplest model for time series forecasting.
Random Walk with Drift: Projects the historic trend from the last observed value.
Seasonal Exponential Smoothing: Adjusts a Simple Exponential Smoothing model for each seasonal period.
Seasonal Naive: Like Naive, but this time the forecasts of the model are equal to the last known observation of the same period in order for it to capture possible weekly seasonal variations.
Seasonal Window Average: Uses the last window (defined by the user) to calculate an average for each seasonal period.
SES: SES stands for Simple Exponential Smoothing. This model recursively weights the most recent observations in the time series. Useful for time series with no trend.
TSB: TSB stands for Teunter-Syntetos-Babai. A modification to Croston’s method that replaces the inter-demand intervals component with the demand probability.
Window Average: Uses the last window (defined by the user) to calculate an average.

Usage

To create an ample set of benchmarks you can install statsforecast which is available in PyPI (pip install statsforecast).

Libraries

https://medium.com/media/00ae2e43f779e45b275cb9c5e239947f/href

Data

In this example, we use the M4 time series competition data. The objective of the competition was to validate models for different frequencies and seasonalities data. The dataset was originally released publicly and it was released with a completely open-access license. To download the data we used nixtlats. In this example, we use Daily time series.

https://medium.com/media/0baad9edb0f25cf2bdeb08a5e145c8a8/href

Initially, the data don’t contain the actual dates of each observation, so the following line creates a datestamp for each time series.

https://medium.com/media/5f25c359e79aa04af8467042590b109a/href

The function M4.load returns train + test data, so we need to separate them.

https://medium.com/media/37a8bb51a6f8a2f111352f1f4c483bbe/href https://medium.com/media/9c38d68119f33f481e97c16e2f0e3434/href https://medium.com/media/ac06f660d3b8f40387b089bc00c90ff1/href

This is the required input format.

an index named unique_id that identifies each time series. In this example, we have 4,227 time series.
a ds column with the dates.
a y column with the values.

Training

We now define the statistical models we will use. We must define a list of functions. If the model has additional parameters, besides the forecast horizon, it must be included as a tuple with the model and the additional parameters.

https://medium.com/media/8ca6e2e6250ba27efd82cd706431ea18/href

Now we define our trainer, StatsForecast, where we define the models we want to use, the frequency of the data, and the number of cores used to parallelize the training job.

In this way adjusting these models and generating forecasts is as simple as the following lines. The main class is StatsForecast; it receives four parameters:

df: A pandas dataframe with time series in long format.
models: A list of models to fit each time series.
freq: Frequency of the time series.
n_jobs: Number of cores to be used in the fitting process. The default is 1 job. To compute the process in parallel you can use the cpu_count() function from multiprocessing.

https://medium.com/media/a751c810fadcf954df5225fb5d61ce28/href https://medium.com/media/d192986684c139fe73f35483eedaab57/href https://medium.com/media/6ee50bee4c348d53f36a6f04dca3eb68/href https://medium.com/media/5835b62781d7001927199d089f710d7a/href

Forecast Value Added

In this example, we’ll use the historic_average model as a benchmark; this is on of the simpler model among the fitted ones (it only takes the mean value of the time series as forecast).

https://medium.com/media/92ac4c3a1cfeb9845bfaded3f2dab09d/href https://medium.com/media/ffb52a6942d05a3bb6a30208c837780c/href https://medium.com/media/fbb7f44ce938c12e139981cce8b450d1/href

As the table shows, the Forecast Value Added against the historic_average model is positive for the majority of the models.

Visualization

In this section we present visual examples of the forecasts generated.

https://medium.com/media/8e864ac8ed0e7bce4d8d03e81909ef64/href https://medium.com/media/337db139affcb597d90e67a0453df674/href

Image by Author

Create your own model

Additionally, you can use the full power ofStatsForecast to parallelize your own model. You just need to define a function with mandatory parameters y, the target time series, and h, the horizon to forecast; in addition, you can add more optional parameters. The function's output must be a numpy array of size h. In the following example, we'll fit a linear regression against time; this is a very basic model but it is useful to explain how to get the full potential of statsforecast.

https://medium.com/media/d4f08254238dc1a9f806603d75359955/href https://medium.com/media/42d4174e26a13266c0ccdadc18bcf03b/href https://medium.com/media/3807a71a3db027573a8602f01ad7df13/href https://medium.com/media/b4e9f626aec446c8d80d8e7618cc91cc/href https://medium.com/media/a925fdf8c2c981052a6f0220a24cef02/href

A more complicated example with extra parameters would be a Lasso regression as follows,

https://medium.com/media/21cf61c0e2c9b221f5f6372dd98e2a67/href

Instead of passing the model, you just need to pass a tuple with the function and the parameter you want to use,

https://medium.com/media/3cbf58613b252407a944abad01e15461/href https://medium.com/media/e6b0597430af8c8edde04fff347c6968/href https://medium.com/media/aea37c1df689accde2537ddc816b94a6/href https://medium.com/media/dd3a684ae360e46cd615d2be1cc7212e/href

Finally, you can train both models and a historic_average model (for benchmarking purposes) at the same time defining the models list as follows,

https://medium.com/media/f1810aaa8ac6358d5b9c3e4b21cb6c54/href https://medium.com/media/7caf42d7ac24c8eaf485b2b22a724da2/href https://medium.com/media/6261323c52589092330d34f7e6eb2f40/href https://medium.com/media/ec21ad53d8bb687597633021fd043940/href

Now we can calculate the FVA for the linear and lasso regression based on the historic average model.

https://medium.com/media/f98d23d3ea10dafae335f981d4b62699/href https://medium.com/media/dcd778142101a4a902f07062fc5ef652/href https://medium.com/media/2fc3a7d79142ed126143cbb0357b7d35/href

So, the table shows a positive FVA for both models; we can also see that the regularization provided by the Ridge regression improves the FVA.

Conclusion

In this post, we introduce statsforecast, a library written in python to quickly fit statistical models. As we saw, in the practice of time series forecasting it is very useful to first fit a simple model, as a benchmark. This benchmark model allows to build more complex models and also to show that its complexity brings value to the process through the FVA.

Statsforecast allows you to create benchmark models in a simple way; moreover, it allows you to fit your own models efficiently by fitting in parallel.

WIP and Next Steps

statsforecast is a work in progress. In the next releases we plan to include:

Automated backtesting.
Ensembles (such as fforma).
More statistical models with exogenous variables.

If you’re interested you can learn more in the following resources:

GitHub repo: https://github.com/Nixtla/statsforecast
Documentation: https://nixtla.github.io/statsforecast/

Time Series Forecasting with Statistical Models was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Open Source Alternative to the AWS Deep Learning AMI

azul garza ramirez — Wed, 17 Nov 2021 20:55:48 GMT

An article describing how to set up GPU infrastructure automatically using conda, Docker, make and terraform.

By Nixtla Team. fede garza ramírez, Max Mergenthaler

TLDR; Running Deep Learning models with GPUS is complicated, particularly when configuring the infrastructure. Prefabricated GPU cloud infrastructure tends to be particularly expensive.

To help people focus on their models rather than on their hardware and its configuration, we at Nixtla developed a fast and simple way to use GPUs on the AWS cloud without paying for the AMI environment and made it open-source: https://github.com/Nixtla/nixtla/tree/main/utils/docker-gpu and https://github.com/Nixtla/nixtla/tree/main/utils/terraform-gpu.

INTRODUCTION

Deep Learning has become widespread in many areas: computer vision, natural language processing, time series forecasting, etc. Due to the state-of-the-art results it has obtained, it has become increasingly popular in the daily practice of data scientists and researchers.

GPUs have accelerated the training and inference of the models because they are optimized to perform linear algebra computations on which deep learning heavily relies. The need for this specialized hardware, however, increases the monetary/economic cost of experimenting and deploying these models to production.

A common problem faced by Deep Learning practitioners is the proper configuration of the GPU infrastructure on the cloud. The installation of required drivers for hardware management tends to be bothersome. When this is not tackled correctly, it can be detrimental to reproducibility or unnecessarily increase the cost of these novel models. In this post, we provide the community with a simple solution to this problem using Docker.

SOLUTION

NVIDIA Deep Learning AMI + Conda environment + Terraform

a) NVIDIA Deep Learning AMI

To run your code with GPU accelerated computation, you need two things covered: (i) have NVIDIA GPUs and (ii) their necessary drivers.

If you opt for EC2 instances (P2, P3, P4D, or G4), NVIDIA provides a free AMI with pre-installed and optimized GPU software for which you only need to pay the EC2 computational costs.

You can easily launch GPU EC2 instances with their corresponding drivers from your terminal with the AWS console. To do it you need:

AWS CLI installed.
EC2 launch permissions.
EC2 connection permissions: (I) The .pem file from the instance launch (you can create one following the instructions here). (II) The instance’s security group .

If you don’t have your own you can create one using:

aws ec2 create-security-group \
        --group-name nvidia-ami \
        --description “security group for nvidia ami”

And add to it ingress rules using:

aws ec2 authorize-security-group-ingress \
        --group-name nvidia-ami \
        --protocol tcp \
        --port 22 \
        --cidr 0.0.0.0/0

With the above, launching a GPU ready EC2 instance is as simple as running:

aws ec2 run-instances \
        --image-id ami-05e329519be512f1b \
        --count 1 \
        --instance-type g4dn.2xlarge \
        --key-name  \
        --security-groups nvidia-ami

The image id ( — image_id) identifies the required NVIDIA AMI. The values for the number of instances ( — count) and the instance type ( — instance-type) are optional.

Once the instance is initialized, we can access it with ssh. The AMI comes pre-installed with git, so we can clone the repo of our project without much additional difficulty.

ssh -i path/to/.pem ubuntu@

b) Conda environments

We recommend the use of Conda to facilitate the handling of Deep Learning dependencies (PyTorch, TensorFlow, etc.), in particular, we recommend creating environments with environment.yml files.

The following image shows an example. The Deep Learning framework used in this example is PyTorch, and standard libraries such as NumPy and pandas were also included. This file is a skeleton, so any additional dependencies can be added without any difficulty. In addition, jupyterlab is included.

The original file can be found here.

As can be seen, the python version to be used is 3.7. This version can be easily adjusted to the user’s needs, as can the other versions of the packages.

To use conda environment you need to install conda first because the NVIDIA AMI doesn’t have it installed. You can follow the next set of instructions:

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && \
bash Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/miniconda && \
rm -rf Miniconda3-latest-Linux-x86_64.sh && \
source $HOME/miniconda/bin/activate && \
conda init

So you can install your environment with,

conda env create -n  -f environment.yml

To verify that everything is correctly installed, you can clone our repo and run a test,

git clone https://github.com/Nixtla/nixtla.git
cd nixtla/utils/docker-gpu
conda env create -n gpu-env -f environment.yml
conda activate gpu-env
python -m test

A final piece of advice: the user must be careful with the version of the Deep Learning framework used, verifying that it is compatible with the NVIDIA AMI drivers.

c) Terraform

To facilitate the creation of the whole process described above, we developed a Terraform script. Terraform is an open-source infrastructure as Code tool that allows you to synthesize all the manual development into an automatic script. In this case, the infrastructure as code we wrote mounts the NVIDIA AMI (including the creation of a compatible security group) and installs conda. The following image shows the main.tf file.

The original file can be found here.

Additionally, a terraform.tfvars file is required for the credentials. An image of this file is shown below.

The original file can be found here.

To use Terraform you only have to install it, following these instructions. Subsequently, you must run

terraform init
terraform apply

This will create the required infrastructure and install conda on the deployed EC2. When Terraform finishes running, you will be able to see the public IP associated with the instance so you only need to use an ssh connection to access it.

ssh -i path/to/.pem ubuntu@

2) NVIDIA Deep Learning AMI + Conda environment + Terraform + Docker + Make

a) Docker

It is common practice to use Docker to ensure the replicability of projects and experiments. In addition, it allows the user to concentrate all the necessary dependencies in one place, avoiding installing dependencies locally that can later cause conflicts.

We use docker because it allows us to isolate the software from the hardware, making computation more flexible. If the load is very heavy, it is enough to change the EC2 instance and just run the code inside the container. On the other hand, if the load is lighter, we can choose a smaller instance.

The following image shows the Dockerfile we built for images to access the instance’s GPU. First of all, an image compatible with the drivers installed on EC2 must be chosen. To date, the NVIDIA AMI uses CUDA version 11.2, so this is the selected image.

The original file can be found here.

Subsequently, additional operating system libraries are installed that may be needed for the project. For example, in the Dockerfile above, wget and curl are installed, which might be useful for downloading data that the project requires.

In the next instruction, miniconda is installed. Conda, as we discussed earlier, will allow us to handle python dependencies and also install them with the environment.yml file shown in the previous section.

We highly recommend using mamba for version management and installation as it significantly improves the waiting time. If the user prefers, she can easily switch to Conda.

Finally, the environment.yml file created earlier is added to the Docker image and installed in the base environment. It will not be necessary to initialize a specific environment every time a container is required.

b) Makefile

Finally, we facilitate the use of a Makefile. Make is a powerful tool for controlling workflows and executable files. Our workflow will allow us to quickly build the Docker image from the Dockerfile and run python and bash modules without continuously declaring the necessary arguments.

The original file can be found here.

In this example, the Docker image will be called gpucontainer, and you can just run make init to build it. Once this instruction is executed, the user can use the run_module instruction to run her python or bash modules using GPUs.

For example, to verify that everything works fine as expected, we create the test.py file that makes sure that CUDA is available for PyTorch and the GPUs are available. This module would be executed as follows:

make run_module module="python -m test"

The original file can be found here.

Other valuable instructions could be to run nvidia-smi inside the Docker container to verify that everything works fine:

make run_module module="nividia-smi"

Or initialize the container interactively, which can be done with make bash_docker. Finally, an instruction is provided to run jupyterlab inside the docker and do experiments interactively:

make jupyter

If port 8888 (default) is used by another process, it can easily be changed using

make jupyter -e PORT=8886

SUMMARY

In this post, we show a simple solution to the problem of configuring GPUs for Deep Learning on the cloud. With this fully open-source workflow, we hope that practitioners in the field will spend more time implementing the models and not so much on the infrastructure as we have done.

Open Source Alternative to the AWS Deep Learning AMI was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Forecasting with Machine Learning Models

azul garza ramirez — Thu, 21 Oct 2021 23:10:15 GMT

Notes from Industry

mlforecast makes forecasting with machine learning fast & easy

By Nixtla Team. fede garza ramírez, Max Mergenthaler

TL;DR: We introduce mlforecast, an open source framework from Nixtla that makes the use of machine learning models in time series forecasting tasks fast and easy. It allows you to focus on the model and features instead of implementation details. With mlforecast you can make experiments in an esasier way and it has a built-in backtesting functionality to help you find the best performing model.

You can use mlforecast in your own infrastructure or use our fully hosted solution. Just send us a mail to federico@nixtla.io for testing the private beta.

Although this example contains only a single time series it, the framework is able to handle hundreds of thousands of them and is very efficient both time and memory wise.

Introduction

We at Nixtla, are trying to make time series forecasting more accessible to everyone. In this post, we’ll talk about using machine learning models in forecasting tasks. We’ll use an example to show what the main challenges are and then we’ll introduce mlforecast, a framework that facilitates using machine learning models in forecasting. mlforecast does feature engineering and takes care of the updates for you, the user only has to provide a regressor that follows the scikit-learn API (implements fit and predict) and specify the features that she wants to use. These features can be lags, lag-based transformations, and date features. (For further feature creation or an automated forecasting pipeline check nixtla.)

Motivation

For many years classical methods like ARIMA and ETS dominated the forecasting field. One of the reasons was that most of the use cases involved forecasting low-frequency series with monthly, quarterly, or yearly granularity. Furthermore, there weren’t many time-series datasets, so fitting a single model to each one and getting forecasts from them was straightforward.

However, in recent years, the need to forecast bigger datasets higher frequencies has risen. Bigger and higher frequency time series impose a challenge for classical forecasting methods. Those methods aren’t meant to model many time series together, and their implementation is suboptimal and slow (you have to train many models) and besides, there could be some common or shared patterns between the series that could be learned by modeling them together.

To address this problem, there have been various efforts in proposing different methods that can train a single model on many time series. Some fascinating deep learning architectures have been designed that can accurately forecast many time series like ESRNN, DeepAR, NBEATS among others. (Check nixtlats and Replicating ESRNN results for our WIP.)

Traditional machine learning models like gradient boosted trees have been used as well and have shown that they can achieve very good performance as well. However, using these models with lag-based features isn’t very straightforward because you have to update your features in every timestep in order to compute the predictions. Additionally, depending on your forecasting horizon and the lags that you use, at some point you run out of real values of your series to update your features, so you have to do something to fill those gaps. One possible approach is to use your predictions as the values for the series and update your features using them. This is exactly what mlforecast does for you.

Example

In the following section, we’ll show a very simple example with a single series to highlight the difficulties in using machine learning models in forecasting tasks. This will later motivate the use of mlforecast, a library that makes the whole process easier and faster.

Libraries

https://medium.com/media/2949ad86b9775dbe3256f0bd36e55573/href

Data

https://medium.com/media/28606d4b94055177baea773d30be3ca6/href

Image by Author

Our data has daily seasonality and as you can see in the creation, it is basically just dayofweek + Uniform({-1, 0, 1}).

Training

Let’s say we want forecasts for the next 14 days, the first step would be deciding which model and features to use, so we’ll create a validation set containing the last 14 days in our data.

https://medium.com/media/b62ea08930a0be363cf755375747cc39/href

As a starting point, we’ll try lag 7 and lag 14.

https://medium.com/media/6678ac1e6425136ac80a1eedb67f3225/href

Image by Author

We can see the expected relationship between the lags and the target. For example, when lag-7 is 2, y can be either 0, 1, 2, 3 or 4. This is because every day of the week can have the values [day — 1, day, day + 1], so when we’re at the day of the week number 2, we can get values 1, 2 or 3. However the value 2 can come from day of the week 1, whose minimum is 0, and it can come from the day of week 3, whose maximum is 4.

Computing lag values leaves some rows with nulls.

https://medium.com/media/d94e200a28cf47579bfa0c92b7c4e45c/href

Image by Author

We’ll drop these before training.

https://medium.com/media/51e998667ce1cb876c08ad239a36eb10/href

For simplicity sake, we’ll train a linear regression without intercept. Since the best model would be taking the average for each day of the week, we expect to get coefficients that are close to 0.5.

https://medium.com/media/9911e8167a25f6b9ab2d8416a2435646/href

Image by Author

This model is taking 0.51 * lag_7 + 0.45 * lag_14.

Forecasting

Great. We have our trained model. How can we compute the forecast for the next 14 days? Machine learning models a feature matrix X and output the predicted values y. So we need to create the feature matrix X for the next 14 days and give it to our model.

If we want to get the lag-7 for the next day, following the training set, we can just get the value in the 7th position starting from the end. The lag-7 two days after the end of the training set would be the value in the 6th position starting from the end and so on. Similarly for the lag-14.

https://medium.com/media/2782d40a6f69cf231e5b17d10110aeb3/href

Image by Author

https://medium.com/media/d4b774475b6b16d52afdeed0bf0f1a7d/href

Image by Author

As you may have noticed we can only get 7 of the lag-7 values from our history and we can get all 14 values for the lag-14. With this information we can only forecast the next 7 days, so we’ll only take the first 7 values of the lag-14.

https://medium.com/media/cc85cd4de72a32da8944006c5760a30c/href

Image by Author

With these features, we can compute the forecasts for the next 7 days.

https://medium.com/media/095ddc8417c424550204bf7d2ab6e1a6/href

Image by Author

These values can be interpreted as the values of our series for the next 7 days following the last training date. In order to compute the forecasts following that date, we can use these values as if they were the values of our series and use them as lag-7 for the following periods.

In other words, we can fill the rest of our features matrix with these values and the real values of the lag-14.

https://medium.com/media/5d538bb07163047b44ffea3bb27c061a/href

Image by Author

As you can see we’re still using the real values of the lag-14 and we’ve plugged in our predictions as the values for the lag-7. We can now use these features to predict the remaining 7 days.

https://medium.com/media/1d3c10223e502ec35a667151e2aa5a0d/href

Image by Author

And now we have our forecasts for the next 14 days! This wasn’t that painful but it wasn’t pretty or easy either. And we just used lags which are the easiest feature we can have.

What if we had used lag-1? We would have needed to do this predict-update step 14 times!

And what if we had more elaborate features like the rolling mean over some lag? As you can imagine it can get quite messy and is very error prone.

mlforecast

With these problems in mind, we created mlforecast, which is a framework to help you forecast time series using machine learning models. It takes care of all these messy details for you. You just need to give it a model and define which features you want to use and let mlforecast do the rest.

mlforecast is available in PyPI (pip install mlforecast) as well as conda-forge (conda install -c conda-forge mlforecast).

The previously described problem can be solved using mlforecast with the following code.

First, we have to set up our data in the required format.

https://medium.com/media/659277045bb90865b85e8e2bd7e34d63/href

Image by Author

This is the required input format.

an index named unique_id that identifies each time serie. In this case we only have one but you can have as many as you want.
a ds column with the dates.
a y column with the values.

Now we’ll import the TimeSeries transformer, where we define the features that we want to use. We’ll also import the Forecast class, which will hold our transformer and model and will run the forecasting pipeline for us.

https://medium.com/media/f748b230463a621a3ae906209bed9538/href

We initialize our transformer specifying the lags that we want to use.

https://medium.com/media/8616b1f5d86cebbfeac6dd293a202188/href

Image by Author

As you can see this transformer will use lag-7 and lag-14 as features. Now we define our model.

https://medium.com/media/5d4393834ae67f11fe38e14ae43e3734/href

We create a Forecast object with the model and the time series transformer and fit it to our data.

https://medium.com/media/fe2095d7cbed28f97e95deb4afa11f99/href

And now we just call predict with the forecast horizon that we want.

https://medium.com/media/3c2e60e92ea5a1c6f8461203ccc925b3/href

Image by Author

This was a lot easier and internally this did the same as we did before. Let's verify real quick.

Check that we got the same predictions:

https://medium.com/media/ed193d71de1fc229513900145810a034/href

Check that we got the same model:

https://medium.com/media/6667dd17e051f09c945445a1ac72bf7d/href

Experiments made easier

Having this high-level abstraction allows us to focus on defining the best features and model instead of worrying about implementation details. For example, we can try out different lags very easily by writing a simple function that leverages mlforecast:

https://medium.com/media/77f72fc5ea5b44aa075aa219eebc6834/href https://medium.com/media/3ec14fece6a3e13b338d42352ff81fc7/href

Image by Author

https://medium.com/media/00a176b2e6684de6fd88f6318ccf05b5/href

Image by Author

https://medium.com/media/6e37af47cc7033e59ca7318e9c35225e/href

Image by Author

Backtesting

In the previous examples, we manually split our data. The Forecast object also has a backtest method that can do that for us.

We’ll first get all of our data into the required format.

https://medium.com/media/305029b997747752cab6c71deb1c11b4/href

Image by Author

Now we instantiate a Forecast object as we did previously and call the backtest method instead.

https://medium.com/media/133f780f43f3301263aefe99317a2fbb/href

This returns a generator with the results for each window.

https://medium.com/media/4a48d3d6c3baac5c64a6e10ccd17b586/href

Image by Author

https://medium.com/media/2e91d3048cdf1f83d8d5f0892b3025cc/href

Image by Author

https://medium.com/media/461b66e765e95714e72aa052c231761f/href

Image by Author

result2 here is the same as the evaluation we did manually.

https://medium.com/media/426c111e6aca2727e124baaf63cbb436/href

We can define a validation scheme for different lags using several windows.

https://medium.com/media/8858c36bad07e8e0fa01caf4a03f6849/href https://medium.com/media/013b733f68c360f67e49891e68da72d7/href

Image by Author

https://medium.com/media/8faa284da73e88688a2a9024427c8995/href

Image by Author

https://medium.com/media/450bdb60474c2321d871ba68f14f26d7/href

Image by Author

Lag transformations

We can specify transformations on the lags as well as just lags. The window_ops library has some implementations of different window functions. You can also define your own transformations.

Let’s try a seasonal rolling mean, this takes the average over the last n seasons, in this case, it would be the average of the last n Mondays, Tuesdays, etc. Computing the updates for this feature would probably be a bit annoying, however, using this framework we can just pass it to lag_transforms. If the transformations take additional arguments (additional to the values of the series) we specify a tuple like (transform_function, arg1, arg2), which in this case are season_length and window_size.

https://medium.com/media/7c6ee3ba5e6795e5f6096e61adbf2a3a/href

help(seasonal_rolling_mean)

Help on CPUDispatcher in module window_ops.rolling:

seasonal_rolling_mean(input_array: numpy.ndarray, season_length: int, window_size: int, min_samples: Union[int, NoneType] = None) -> numpy.ndarray
    Compute the seasonal_rolling_mean over the last non-na window_size samples of the
    input array starting at min_samples.

lag_transforms takes a dictionary where the keys are the lags that we want to apply the transformations to and the values are the transformations themselves.

https://medium.com/media/3867c848f0df6a7e43f8b5bf95e3bac6/href

Image by Author

Date features

You can also specify date features to be computed, which are attributes of the ds column and are updated in each time step as well. In this example, the best model would be taking the average over each day of the week, which can be accomplished by doing one-hot encoding on the day of the week column and fitting a linear model.

https://medium.com/media/98ea605dd3ec1e7e27c3ff296475d457/href

Image by Author

Next steps

mlforecast has more features like distributed training and a CLI. If you’re interested you can learn more in the following resources:

GitHub repo: https://github.com/Nixtla/mlforecast
Documentation: https://nixtla.github.io/mlforecast/
Example using mlforecast in the M5 competition: https://www.kaggle.com/lemuz90/m5-mlforecast

Forecasting with Machine Learning Models was originally published in TDS Archive on Medium, where people are continuing the conversation by highlighting and responding to this story.

Automated Time Series Forecasting Pipeline Faster and More Accurate than Amazon Forecast

azul garza ramirez — Fri, 15 Oct 2021 18:44:07 GMT

Automated Time Series Forecasting Pipeline: Faster and More Accurate than Amazon Forecast

TLDR: We built a fully open-source time-series pipeline capable of achieving 1% of the performance in the M5 competition, performing 25% better than Amazon Forecast in less than an hour and 20% better than fbprophet. To test the production version write to federico@nixtla.io.

By Nixtla Team. fede garza ramírez, Max Mergenthaler

Time Series forecasting is an exciting field for Machine Learning. Its applications can be found everywhere, ranging from inventory management, financial predictions to healthcare analytics.

In contrast with other Machine Learning tasks that treat their slowly changing datasets like constants over time and only pay attention to these changes when they are no longer negligible, a time series dataset is explicit when accounting time within its structure. This time dimension imposes a structure and constraints in the datasets, making the ML model life cycle faster.

If a time series forecasting model is deployed in production, several steps need to be addressed:

Data Ingestion: to communicate the data to powerful and fast computing services.
Data Preprocessing: to clean the data by removing outliers and filling missing observations, and to enhance the data with time-series features, like auto-regressors, statistical summaries, and other variables like calendar variables and holidays.
Model Training: to select from well-performing models and statistical benchmarks.
Hyperparameter Selection: to find models capable of generalization with good prediction performance.
Model Deployment: to evaluate and make the predictions available to the users.

Time Series Pipeline Automation

All of these steps are challenging and time-consuming. And automating them can help data scientists save time and apply their skills to discovering, creating, and building.

In Nixtla, we have developed an end-to-end forecasting pipeline throughout our projects that include sklearn, lightGBM, and in general, any model with “fit” and “predict” methods as an out-of-the-box solution for developers capable of integrating with other pipelines.

With our solution, any data scientist or developer can set up their forecasting service on AWS by following the instructions in the repository. Or, if you prefer, you can ask us for free trial keys to test the solution on Nixtla’s infrastructure (just send an email to federico@nixtla.io or open a GitHub issue).

At Nixtla we strongly believe in open-source, so we have released all the necessary code so that anyone can set up their time-series processing service in the cloud (using AWS). That same repository uses continuous integration and deployment to deploy the APIs on our infrastructure.

If you want to deploy Nixtla on your AWS Cloud, you will need:

API Gateway (to handle API calls).
Lambda (or some computational unit).
SageMaker (or some bigger computational unit).
ECR (to store Docker images).
S3 (for inputs and outputs).

You will end up with an architecture that looks like the following diagram:

Each call to the API executes a particular Lambda function depending on the endpoint. That particular lambda function instantiates a SageMaker job using a predefined type of instance. Finally, SageMaker reads the input data from S3 and writes the processed data to S3, using a predefined Docker image stored in ECR.

Forecasting Pipeline as a Service

Our forecasting pipeline is modular and built upon simple APIs:

1. tspreprocess

Time series usually contains missing values. This is the case for sales data where only the events that happened are recorded. In these cases it is convenient to balance the panel, i.e., to include the missing values to correctly determine the value of future sales.

The tspreprocess API allows you to do this quickly and easily. In addition, it allows one-hot encoding of static variables (specific to each time series, such as the product family in case of sales) automatically.

2. tsfeatures

It is usually good practice to create features of the target variable so that they can be consumed by machine learning models. This API allows users to create features at the time series level (or static features) and also at the temporal level.

The tsfeatures API is based on the tsfeatures library also developed by the Nixtla team (inspired by the R package tsfeatures) and the tsfresh library.

With this API the user can also generate holiday variables. Just enter the country of the special dates or a file with the specific dates and the API will return dummy variables of those dates for each observation in the dataset.

3. tsforecast

The tsforecast API is responsible for generating the time series forecasts. It receives as input the target data and can also receive static variables and time variables. At the moment, the API uses the mlforecast library developed by the Nixtla team using LightGBM as a model.

In future iterations, the user will be able to choose different Deep Learning models based on the nixtlats library developed by the Nixtla team.

4. tsbenchmarks

The tsbenchmarks API is designed to easily compare the performance of models based on time series competition datasets. In particular, the API offers the possibility to evaluate forecasts of any frequency of the M4 competition and also of the M5 competition.

These APIs, written in Python, can be consumed through an SDK also written in Python. The following diagram summarizes the structure of our pipeline:

Data Format

Nixtla’s infrastructure is built to receive the same data structure throughout the entire pipeline.

1. Target Data

The target data must contain three columns: the identifier of each of the time series, the column that identifies the time of the observation, and the column of the target variable. In other words, it must be a time series panel (or long format).

2. Static Data

Static data, i.e. data that are common in time for each time series, must have the identifier of each time series and also the static variables to be considered:

3. Temporal Data

Like the target data, the exogenous time data must have an identifier for each of the time series, the time identifier and also the exogenous variables to be considered. Additionally, this dataset must contain the exogenous variables of the time-period to be forecasted:

Proof of Concept: Large Online Retail Dataset Example

This section demonstrates how the APIs can be integrated into an end-to-end forecasting pipeline on one large retail dataset. We compare Nixtla’s performance against the top solutions of the competition and also with Amazon Forecast, the AutoML solution for time series forecasting developed by AWS. Nixtla’s solution achieves the top 1% of the performance without much effort. You can achieve the top 1% directly in Colab.

M5 Competition

The M5 competition is composed of Walmart’s daily sales for stores in three states in the United States. The dataset includes department, product categories, and store details. A full description of the competition can be found here.

As a benchmark, we use fbprophet. For this, we ran the parallelized solution on an AWS EC2 of type c5d.24xlarge (96 cores, 185GB RAM). Reproduction of these results can be found here.

We also ran the AWS AutoML solution called Amazon Forecast. We used the same data as in our solution. Our solution offers the following advantages:

Usability. Amazon Forecast through the AWS console requires uploading the data in CSV format directly to S3, which makes it complex and time-consuming due to the size of the datasets.
Speed. Amazon Forecast took approximately 4 hours to run the entire forecast, compared to at most 1 hour for our solution.
Performance. Our solution reaches 1% while Amazon Forecast is far behind.
Open-source. The user knows exactly the code that is running through the API in contrast to Amazon Forecast where the best performing model is known but not the code behind it.

Usability

To use our solution you just need to install the library autotimeseries from PyPI as follows:

pip install autotimeseries

Import the library and add the keys:

https://medium.com/media/ca9ec3c9d3d1dbd30dca48641ac18790/href

AutoTS class wraps all the APIs for building a simple pipeline. To instantiate it, define the credentials and the bucket name on S3 where the data will be uploaded.

https://medium.com/media/95e7b219b01396769ab2f61eb94bbfa9/href

First, upload the data in CSV or Parquet format to S3:

target: time-series variable of interest. Must have three columns: unique_id, datestamp and value.
static: exogenous static features for each unique_id. Must have unique_id and features in columns.
temporal: exogenous temporal features. Must have unique_id, datestamp, and values for each feature.
calendar-holidays: dictionary with holiday name and dates with occurrences.

The data for this example was generated with src.upload_data script available here. We recommend using Parquet format to reduce the size of files and uploading time.

See below:

https://medium.com/media/8a3cc420f38a7babe5fb1a20ddaef6a9/href

Specify the names for unique_id_column, ds_column, and y_column on the target file.

https://medium.com/media/641f7fa4fd9f3cb56b1a880dc577e6aa/href

Additional features can boost the performance of models significantly. It allows the model to incorporate exogenous events, such as holidays, which drastically affect the target time series. AutoTS features module automatically generates temporal and calendar features.

In this example, we use calendartsfeatures() method to create calendar features specific to the US:

https://medium.com/media/bc276c73866ff99113eacc8663c0d748/href

To run the forecasts, simply call the tsforecast() method. This method initiates a SageMaker job to train the AutoTS model and produce the forecasts, starting after the last date of the training data in the target dataset. The forecast horizon of the M5 competition is 28 days, which we can specify with the horizon parameter.

https://medium.com/media/b71e4e910706f434db2f119749986e2f/href

Forecasting Performance

We measure our pipeline’s point predictions performance following the competitions evaluation metric: the Weighted Root Mean Square Scaled Error (WRMSSE) as shown below:

The results were also computed by uploading a late submission to the official evaluator. As can be seen in the following table, Nixtla’s forecasts perform better than the 50th place winner. This puts Nixtla in the top 1% of the competition with a processing time of less than one hour:

The following are examples of the forecasts generated by the Nixtla pipeline:

Computational Performance

We also measured the computation performance of our solution against the AutoML solution provided by AWS. Amazon Forecast took 4 times longer than our solution.

Summary

We introduced the problem of automation of time series forecasting and showed how Nixtla’s open-source APIs can build robust forecasting pipelines with little effort. We showed how the current version of the forecasting pipeline achieves accuracy within the top 1% of the M5 submissions in less than an hour.

Contact us

We are looking for people to help us build and validate Nixtla, so please reach out to us if:

You have feedback or want to talk about forecasting.
You want to be part of the private beta of our fully hosted solutions.
You are interested in using Nixtla at your company.

Mail: federico@nixtla.io

Whatsapp: Scan the QR code. :)

Contribute

Report errors and request features by adding Issues on GitHub
Contribute to the codebase directly on GitHub!
Nixtla ecosystem: https://github.com/Nixtla.

Nixtla Team

More content at plainenglish.io

Automated Time Series Forecasting Pipeline Faster and More Accurate than Amazon Forecast was originally published in AWS in Plain English on Medium, where people are continuing the conversation by highlighting and responding to this story.

Forecasting in Python with ESRNN model

azul garza ramirez — Tue, 16 Jun 2020 22:53:21 GMT

M4 Competition and Background

Deep Learning algorithms enjoys success in a variety of tasks ranging from image classification to natural language processing; its use in time series forecasting has also began to spread. On the recent M4 major forecasting competition, a novel multivariate hybrid ML(Deep Learning)-time series model called Exponential Smoothing Recurrent Neural Network (ESRNN) won by a large margin over baselines and complex time series ensembles.

In this post, we introduce the model and show its use on a Pytorch implementation which achieves state of the art performance on the M4 competition:

The GPU implementation achieves a x300 speed up over the original Smyl model in C++ using Dynet library.
The model can be easily used on new (non M4) data, since our class was built similar to scikit-learn models with fit and predict methods.

For anyone interested in exploring the model deeper the package is available at https://pypi.org/project/ESRNN/ and the following github page https://github.com/kdgutier/esrnn_torch.

Model

The premise of this model is simple and yet intuitive and appealing. The model cleverly combines the classic Exponential Smoothing model (ES) and a Recurrent Neural Network (RNN). The ES decomposes the time series in level, trend and seasonality components. The RNN is trained with all the series, has shared parameters and it is used to learn common local trends among the series while the ES parameters are specific for each time series. The models are combined by including the output of the RNN as the local trend component in the ES model.

One main challenge of this idea is that local trends are not directly observed. Also, for the output of the RNN to be meaningful the trends must be comparable between series. The model addresses this by normalizing and deseasonalizing the series given by the ES decomposition. This preprocessing is then an integral part of the algorithm instead of taking place before the training process. Another advantage of the RNN is that allows for exogenous variables, which in the M4 example corresponds to dummies of the category.

Regarding the architecture of the RNN, Smyl proposed to use different architectures depending on the frequency of the data. The basic architecture is a dilated-RNN with LSTM cells, this allowed the RNN to reduce the number of parameters while stacking more layers. For series without obvious seasonality, such as the yearly data, an attention layer is added. More information on these architectures can be found in the references.

Loss function

The ESRNN model optimizes over two losses. First, the quantile loss with minimizer the quantile of the target variable and second, a penalty on the variance or wiggliness of the predictions as a regularizer. The quantile loss is given by:

The quantile loss makes the model to predict the conditional quantiles of the target distribution, it is robust and does not make distributional assumptions. Usually the model is trained to fit the median, but in case the model consistently underestimates or overestimates the target values, the quantile can be changed accordingly.

Example on M4 data

Usage Example

The library can be installed from the python package index with:

pip install ESRNN

The library also includes some utilities that allows us to easily experiment with the model. The prepare_m4_data function allows us to obtain data from the M4 competition, so it can be easily used with the model. In particular, it returns predictions from the Naive2 model; this predictions can be used to evaluate each iteration of the ESRNN through the Overall Weighted Average. Here we are obtaining the 414 hourly time series of the M4 data, which are stored in the './data' folder:

https://medium.com/media/0368d9eee4a9217a670ac8925d228aea/href https://medium.com/media/e1d7995519f856ff97da86f751829ab5/href

Successfully downloaded M4-info.csv 4335598 bytes.
Successfully downloaded Train/Daily-train.csv 95765153 bytes.
Successfully downloaded Train/Hourly-train.csv 2347115 bytes.
Successfully downloaded Train/Monthly-train.csv 91655432 bytes.
Successfully downloaded Train/Quarterly-train.csv 38788547 bytes.
Successfully downloaded Train/Weekly-train.csv 4015067 bytes.
Successfully downloaded Train/Yearly-train.csv 25355736 bytes.
Successfully downloaded Test/Daily-test.csv 576459 bytes.
Successfully downloaded Test/Hourly-test.csv 132820 bytes.
Successfully downloaded Test/Monthly-test.csv 7942698 bytes.
Successfully downloaded Test/Quarterly-test.csv 1971754 bytes.
Successfully downloaded Test/Weekly-test.csv 44247 bytes.
Successfully downloaded Test/Yearly-test.csv 1486434 bytes.


Preparing Hourly dataset
Preparing Naive2 Hourly dataset predictions

The model is built to function similarly to scikit-learn models. It is instantiated as follows (for a detailed description of the parameters, see the documentation):

https://medium.com/media/9173a0789893138b891f1539dfd12d4e/href

The model is trained with the fit method. If the test set is passed to it, the method will compute out-of-sample losses for this set at the end. This method receives X_df, y_df training pandas dataframes in long format. Optionally X_test_df and y_test_df to compute out of sample performance.

The 'X' and 'y' dataframes must contain the same values for 'unique_id', 'ds' columns and be balanced, ie.no gaps between dates for the frequency.

The frequency of computing and reporting this loss can be changed with the freq_of_test hyperparameter.

https://medium.com/media/f533240bcc1d778e1753952602f6be83/href

model.fit(X_train_df, y_train_df)

Infered frequency: H
=============== Training ESRNN  ===============

========= Epoch 0 finished =========
Training time: 50.14884
Training loss (50 prc): 0.70241
========= Epoch 1 finished =========
Training time: 51.24384
Training loss (50 prc): 0.59290
========= Epoch 2 finished =========
Training time: 51.81561
Training loss (50 prc): 0.53481
========= Epoch 3 finished =========
Training time: 52.64761
Training loss (50 prc): 0.49683
========= Epoch 4 finished =========
Training time: 50.96984
Training loss (50 prc): 0.46950
Train finished!

Finally the predictions are obtained with the predict method. Furthermore, the package has a special function to calculate the OWA of the predictions, evaluate_prediction_owa.

https://medium.com/media/98b07fc82a2d59a70ffcbe04c8085abc/href

===============  Model evaluation  ==============
OWA: 0.987 
SMAPE: 15.623 
MASE: 2.69

A function has also been implemented to plot predictions:

https://medium.com/media/90542343372a81bbacbc1629a66f0dfe/href

Comparison with M4 winning submission

Naive2 Forecast

The Naive2 model is a popular benchmark model for time series forecasting that automatically adapts to the potential seasonality of a series based on an autocorrelation test. If the series is seasonal the model composes the predictions of Naive and Seasonal Naive, else the model predicts on the simple Naive. Following the M4 competition practice we report the relative performance of the ESRNN compared to Naive2.

Overall Weighted Average

To quantify the aggregated errors we use the Overall Weighted Average (OWA) proposed for the M4 competition. This metric is calculated by obtaining the average of the symmetric mean absolute percentage error (sMAPE) and the mean absolute scaled error (MASE) for all the time series and also calculating it for the Naive2 predictions. Both sMAPE and MASE are scale independent. These measurements are calculated as follows:

The following table shows the OWA obtained by our implementation and the original model. The results deviate slightly from original implementation, but still very competitive on the M4 leaderboard, placing it in the top 5 models. Also, these results were achieved with a x300 speedup over Smyl’s implementation, since we are batching the time series for training and our model can be trained in GPU.

How to contribute

The full code is publicly available at github. To contribute you can fork this repository and make a PR with your improvements. You can also create issues if you have problems running the model.

Authors

This repository was developed with joint efforts from AutonLab researchers at Carnegie Mellon University and Orax data scientists.

References

Forecasting in Python with ESRNN model was originally published in Analytics Vidhya on Medium, where people are continuing the conversation by highlighting and responding to this story.