The simplest way to add weather data to your forecast!

Fred Hoffman
bytehub-ai
4 min readOct 13, 2020

--

Many real world problems involve time-series forecasting, and all forecasting problems can be improved with the right data. But getting the right data for a forecasting problem can be a challenge. Once in possession of the right data putting that data in the right format to start building forecasts presents new challenges and can be very time consuming.

In this blog post we show the simplest way to access data and use it to build a time-series forecast. We forecast UK electricity demand and show that by simply adding a temperature variable we can improve accuracy by 13%.

Our method uses ByteHub AI’s minimalist python API, based on the concept of a feature store to make time-series access and preparation simple.

Access

Using a ByteHub feature store preloaded with weather data feeds (from the MET Office) and UK electricity demand data feeds (from BMRS), we can access the right data for our forecasting problem in few lines of code.

Load the feature store,

>> import bytehub as bh
>> fs = bh.FeatureStore()

Check that the data is preloaded,

>> fs.list_features()['bmrs.feature.indo',
'met-office.data.daily-temperature']

Energy demand and temperature data are now at our fingertips! Lets explore the preloaded data. What are the date ranges of the two time-series?

>> fs.get_first('bmrs.feature.indo')    time                       entity   value
---------------------------------------------
0 2013-01-01 00:00:00+00:00 None 30572
>> fs.get_first('met-office.data.daily-temperature') time entity value
---------------------------------------------
0 2010-01-01 00:00:00+00:00 None -0.5

The energy demand time-series runs from January 2013, temperature runs from January 2010.

>> fs.get_last('bmrs.feature.indo')    time                       entity   value
---------------------------------------------
0 2020-10-11 22:30:00+00:00 None 23194
>> fs.get_last('met-office.data.daily-temperature') time entity value
---------------------------------------------
0 2020-09-30 00:00:00+00:00 None 11.6

Demand runs to October 2020 and temperature runs to September 2020.

On what frequency are the two time-series sampled?

>> fs.get_freq('bmrs.feature.indo')'30T'>> fs.get_freq('met-office.data.daily-temperature')'D'

The get_freq function returns frequency using the simple convention of Pandas offset aliases. Demand is sampled at a half-hourly frequency and temperature is sampled daily.

Preparation

Let’s build a model of daily electricity demand between 2013 and 2020. For this we must prepare the data accordingly. In the ByteHub API it is one line of code,

>> df = fs.get_timeseries(['met-office.data.daily-temperature',
'bmrs.feature.indo'],
from_date='2013-01-01 00:00:00+00:00',
to_date='2020-01-01 00:00:00+00:00',
freq='D'
)
>> df.head(5)
time entity daily-temperature indo
-------------------------------------------------------------------
0 2013-01-01 00:00:00+00:00 None 5.3 30572.0
1 2013-01-02 00:00:00+00:00 None 7.0 28574.0
2 2013-01-03 00:00:00+00:00 None 8.3 30687.0
3 2013-01-04 00:00:00+00:00 None 8.9 30874.0
4 2013-01-05 00:00:00+00:00 None 9.0 30894.0

Compare this to how long the task would be without the feature store. We would need to download and extract the data using complicated APIs and re-format it into tidy dataframes. Then we would need to align the timestamps on each series and deal with any gaps and data quality issues, before finally joining all of the variables together. All of this creates a complex pipeline of code before we even start any machine-learning.

Forecasting

But with the feature store we are now ready to forecast demand straight away!

Our dataframe of time-series can now simply be fed into a forecasting package like Facebook Prophet.

First let’s build a basic model without temperature,

>> df.rename(columns={'time': 'ds', 'indo': 'y'}, inplace=True)
>> model = Prophet()
>> model.fit(df)

We rename the target, y, and time, ds, and then fit.

Next let’s build a more advanced model that includes temperature,

>> model = Prophet()
>> model.add_regressor('daily-temperature')
>> model.fit(df)

Performing a rolling 14 day ahead forecast every 7 days for the two models demonstrates the power of adding a weather variable,

Plotting mean absolute error against forecast horizon in days shows that the forecasting model with weather is considerably more accurate. In fact, by adding temperature, we have improved MAE on average by 13%.

The full code is available in a jupyter notebook on Github. The feature store requires login credentials, get in touch for a free trial.

Conclusion

We think that feature stores with simple, easy-to-understand APIs will allow data scientists to be much more productive when tackling time-series problems and enable much easier implementation into production-ready systems. So next time you start a data science project invest in a feature store.

At ByteHub AI we build tools to make it easier to integrate time-series data with your ML applications and analytics. Get in touch if you’d like to know more about this topic.

--

--