Facebook Prophet forecasts running in Snowflake with Snowpark

Now Facebook Prophet runs inside Snowflake — no need for external functions or dependencies. Let’s see how.

“Crystal ball at the Snowpark” — image generated by AI

Previously I showed how to run Facebook Prophet as an external function, and then I wrote about running pmdarima inside Snowflake as a Python UDF — thanks to Snowpark and Anaconda. Today I’m happy to announce that Facebook Prophet runs inside Snowflake — without any additional hassles or compilation steps. Let’s check how:

Update 2023–08–29

If the code below gets you an error like:

in __init__ raise TypeError( TypeError: This is a python-holidays
entity loader class. For entity inheritance purposes please import a class you want
to derive from directly: e.g., `from holidays.countries import Entity` or `from
holidays.financial import Entity`. in function PROPHETIZE_DAILY with handler X'
packages = ('holidays==0.18','prophet')

That’s because holidays keeps changing its API — and you can fix this by anchoring the package to a specific version with:

packages = ('holidays==0.18','prophet')

(Thanks Mauricio Rojas)

A simple UDF

This is a simple Python UDF that you can create to run Facebook Prophet inside Snowflake:

create or replace function simple_prophet_udf(
dates array, vals array, periods number)
returns variant
language python
runtime_version = 3.8
packages = ('prophet')
handler = 'x'
as $$
import pandas as pd
from prophet import Prophet
def x(dates, vals, periods):
df = pd.DataFrame(list(zip(dates, vals)),
columns =['ds', 'y'])
model = Prophet()
model.fit(df)
future_df = model.make_future_dataframe(
periods=periods,
include_history=False)
forecast = model.predict(future_df)
return [[x.yhat, x.ds] for x in forecast.itertuples()]
$$
;

The basic elements:

  • It receives an array of dates, an array of numbers, and a request of how many periods to predict.
  • To write a Python UDF in Snowflake, just tag it tag it withlanguage python.
  • Then any package provided by Anaconda can be brought in without further steps — in this case we ask for Prophet with packages=(‘prophet’). Every other compilation step and complex dependencies have been already resolved — including the binary dependencies.
  • Then the UDF follows the basic Prophet steps, and returns an array of predicted dates and values.

Let’s see it in action, called within a SQL query:

select value[1]::date d, value[0] prediction
from table(flatten(
simple_prophet_udf(['2020-01-01', '2020-01-02'], [1, 2], 4)
))

Here we are giving Prophet the values “1” and “2”, and we ask it to predict the next 4 values in this series. Prophet doesn’t disappoint:

Prophet correctly predicts a linear growth during the 4 following days

A more complex UDTF

The UDF above is simple, but we probably want to work with different series in tables, and receive tables when running multiple predictions.

The following UDTF uses the same principles above, but it receives rows and outputs rows too:

create or replace function prophetize_daily(id string, x float, y date, periods number)
returns table(id string, y float, d string)
language python
runtime_version = 3.8
packages = ('prophet')
handler = 'X'
as $$
import pandas as pd
from prophet import Prophet
class X:
def __init__(self):
self._dates = []
self._values = []

def process(self, id, y, d, periods):
self._dates.append(d)
self._values.append(y)
self._id = id
self._periods = periods
return ((id, y, d), )
def end_partition(self):

df = pd.DataFrame(list(zip(self._dates, self._values)),
columns =['ds', 'y'])

model = Prophet()
model.fit(df)
future_df = model.make_future_dataframe(
periods=self._periods,
include_history=False)
forecast = model.predict(future_df)
for row in forecast.itertuples():
yield(self._id, row.yhat, row.ds)
yield(self._id + '_lower', row.yhat_lower, row.ds)
yield(self._id + '_upper', row.yhat_upper, row.ds)
$$;

Note that I chose to output 3 rows for each prediction: One with the actual prediction, and 2 rows with the lower and upper bound for it. I could have chosen to output only 1 row with these values as columns — but this way I got an easier time when visualizing each in Snowsight.

Prophet UDTF in action

To put this in action, let’s predict currency values. In this case I’m taking the past values of the Japanese Yen and the Mexican Peso, and projecting them a year into the future:

with data as (
select "Currency", "Units", "Currency Description", "Date", "Value"
from knoema_economy_data_atlas.economy.exratescc2018
where "Indicator Name"= 'Close'
and "Currency Exchange" = 'Real-time FX'
and "Frequency" = 'D'
and "Date" > '2021-08-01'
and "Currency Description" in (
'US Dollar Mexican Peso'
, 'US Dollar Japanese Yen')
)
select p.*
from data
, table(prophetize_daily("Currency", "Value", "Date", 360)
over (partition by "Currency")) p

The results look good, including a lower and upper boundary range for the predictions:

Predictions for the Japanese Yen and Mexican Peso

We can visualize these predictions with Snowsight too:

Predictions for the Japanese Yen and Mexican Peso

You can see above that the Mexican Peso is predicted to be more stable than the Japanese Yen, that exhibits a larger confidence interval as time goes on.

By the way — to get daily updated currencies in Snowflake just check out the data shared by Knoema.

Want more?

I’m Felipe Hoffa, Data Cloud Advocate for Snowflake. Thanks for joining me on this adventure. You can follow me on Twitter and LinkedIn. And subscribe to reddit.com/r/snowflake for the most interesting Snowflake news.

--

--

Felipe Hoffa
Felipe Hoffa

Written by Felipe Hoffa

Data Cloud Advocate at Snowflake ❄️. Originally from Chile, now in San Francisco and around the world. Previously at Google. Let’s talk data.

Responses (2)