What is the simplest way to load and query time-series?

Published in

bytehub-ai

3 min readSep 29, 2020

Many real world problems involve time-series prediction, but before we can start building predictive models we need time-series in a format that fits our problem. We also ideally want to be able to get time-series in different formats for different parts of the development process, for example we might want batches of time series for development and real-time time-series for production. We might want to transform time-series to make them more useful to our application or experiment with different temporal resolutions. All of this development can be time consuming, error prone, difficult to organise and computationally slow if not designed well.

At ByteHub AI we’ve developed a fast, minimalist python API to make time series preparation simple, repeatable and easy to deploy. Our time-series API is based on the concept of a feature store:

Transformed time-series are stored as features ready to serve to predictive models.
The code to compute features is organised and run on the feature store, meaning that this logic is kept separate from modelling code, gaining reusability, maintainability and extendibility;
The same features can be served either in batches (for training) or in real-time (for production ML models), making it easier to deploy ML systems;
Features can also be shared across different teams in an organisation, meaning that the effort spent on feature engineering can be re-used across multiple ML models.

To make our feature store super performant, the database backend is Timescale, a SQL database optimised for fast time-series querying. The ByteHub feature store also comes integrated with useful time-series data feeds such as hyperlocal global weather data, energy market data or financial market data.

Let’s see an example: a typical application might want to pull in weather data and market data in order to predict energy demand on different time scales.

What are the challenges?

Access and prepare the data sources;
Ensure the time stamps are coherent and aligned across the data sources when querying; and
Ensure the changes in temporal frequency fill in time series values logically.

This is how easy it is in Bytehub:

Create a new feature store,

>> import bytehub as bh>> fs = bh.FeatureStore()

This feature store has come preloaded with two pre-prepared features, UK land surface temperature, named temperature.actuals, and UK electricity demand, named, demand.actuals. All of the complexities of access and preparation are abstracted away.

We can list the features in our feature store

>> fs.list_features()['temperature.actuals', 
 'demand.actuals']

We can retrieve the temporal resolution of the underlying time series

>> fs.get_freq(["temperature.actuals", "demand.actuals"])['10T', 
 '30T']

So, temperature is fed in at a 10 minute resolution and demand at half hourly resolution.

Lets pull in time-series from the prepared features. If we want 10 years between 2011 and 2021 at monthly resolution we simply do

>> fs.get_timeseries(["temperature.actuals", "demand.actuals"],    
           from_date="2011/01/01", to_date="2021/01/01", freq="1M")

The differences in resolution are resolved, all time stamps and values are aligned and sensibly back filled as required.

If we now want hourly resolution its as simple as

>> fs.get_timeseries(["temperature.actuals", "demand.actuals"],    
           from_date="2011/01/01", to_date="2021/01/01", freq="1H")

The data is now loaded and ready for forecasting electricity demand. Integrating time-series data in this way can lead to large uplifts in model accuracy. We found that adding temperature to a demand forecasting model improves accuracy by 10 percent.

We think that feature stores with simple, easy-to-understand APIs will allow data scientists to be much more productive when tackling time-series problems and enable much easier implementation into production-ready systems. So next time you start a data science project invest in a feature store.

At ByteHub AI we build tools to make it easier to integrate time-series data with your ML applications and analytics. Get in touch if you’d like to know more about this topic.

What is the simplest way to load and query time-series?

Written by Fred Hoffman