# Forecasting multiple time-series using Prophet in parallel

## A short story about multiprocessing

For a few weeks I have been using Facebook Prophet library, its a great tool for forecasting time-series, because is pretty simple to use and the forecasted results are pretty good!, but doesn’t run all the process in parallel, so basically if you want to forecast multiple timeseries all the process could take a lot of time, this is how we reduce the forecasting process time significantly using multiprocessing package from Python.

# Generating time-series

We are going to generate 500 random time-series, the purpose of this post is not to evaluate the effectiveness of Prophet prediction, but the time required to do accomplish this.

So I wrote a function that generates random time-series between a time period:

import pandas as pd

import numpy as npdef rnd_timeserie(min_date, max_date):

time_index = pd.date_range(min_date, max_date)

dates = (pd.DataFrame({'ds': pd.to_datetime(time_index.values)},

index=range(len(time_index))))

y = np.random.random_sample(len(dates))*10

dates['y'] = y

return dates

So one of our random time-series looks like this:

Lets generate 500 series

`series = [rnd_timeserie('2018-01-01','2018-12-30') for x in range(0,500)]`

We have generated our time-series, now its time to run Prophet.

# Forecasting using Prophet

Let’s create a simple Prophet model, for this we define a function called `run_prophet`

that takes a time-series and fits a model with the data, then we can use that model to predict the next 90 days.

from fbprophet import Prophetdef run_prophet(timeserie):

model = Prophet(yearly_seasonality=False,daily_seasonality=False)

model.fit(timeserie)

forecast = model.make_future_dataframe(periods=90, include_history=False)

forecast = model.predict(forecast)

return forecast

For example, we can run this function with the first generated time-serie:

`f = run_prophet(series[0])`

f.head()

We can see our forecasted results for that serie:

## Running 500 time-series

Now let’s add a timer and run prophet for the 500 time-series without using any kind of multiprocessing tool, i’m using `tqdm`

so I can check the progress

`start_time = time.time()`

result = list(map(lambda timeserie: run_prophet(timeserie), tqdm(series)))

print("--- %s seconds ---" % (time.time() - start_time))

The previous code took: `12.53 minutes`

to run, the processors usage looked like this the whole time:

Now, let’s add `multiprocessing`

to our code, the idea here is to launch a process for each time-serie forecast, so we can run our `run_prophet`

function in parallel while we do the map of the list.

For this we are going to use a `Pool`

of process and quoting the documentation:

The

`object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism).`

Pool

from multiprocessing import Pool, cpu_countp = Pool(cpu_count())

predictions = list(tqdm(p.imap(run_prophet, series), total=len(series)))

p.close()

p.join()

print("--- %s seconds ---" % (time.time() - start_time))

With the previous code, we launch `N`

processes depending of how many CPUs our machine has, then we run the `run_prophet`

function for each time serie among the cpus.

The code took `3.04 minutes`

to run, the usage of CPUs in the whole run time looked like this:

So we got a `speedup`

of 4.12!, which is pretty good!!!, my machine only has 8 CPU, if we want to run this faster, we could use a machine with more CPUs.

## Conclusions

We could see that using multiprocessing is a great way to forecasting multiple time-series faster, in many problems multiprocessing could help to reduce the execution time of our code.

In a real world problem, we decreased the forecasting of `29000`

time-series from `13`

hours to `45`

minutes using multiprocessing with a large CPU machine on Google Cloud.