Forecasting household energy consumption

Maria Jacob
The Centre for Net Zero Tech Blog
6 min readFeb 2, 2024

The increasing prevalence of smart meters unlocks the possibility of forecasting future electricity demand from a very granular level, e.g. forecasting demand of individual assets (the lowest level of aggregation) to forecasting national demand (highest level of aggregation).

Fig.1 : applications in demand forecasting in the energy sector. Reference: Hong, Tao and Fan, Shu (2016)

One application where forecasts are necessary is demand response which typically requires forecasting to be done from real-time to day-ahead (fig.1).

Domestic demand response

Demand response refers to changes made in electricity consumption (or generation) in response to changes in supply (and demand) of electricity, ultimately contributing to grid stability, efficiency and energy security. It can be used to address various challenges in the energy industry, e.g. energy balancing, maintaining network frequency, managing constraints associated with transmission, etc. (fig.2). Depending on the challenge, the solution or flexibility service, will have appropriate notice period, duration of response and required predictability.

Fig.2: examples of demand response services in the energy sector. Ref: table 1 CrowdFlex Discovery technical report

Historically residential or domestic customers have not been contributors of demand response, or flexibility, due to the belief that household load was mostly inflexible. However, both this belief and the domestic demand response contribution is changing as increasing penetration of low carbon technologies (LCTs) such as electric vehicles, heat pumps, and smart thermostats allows for more automation, scheduling and remote control of consumption.

Services like the Demand Flexibility Service (DFS) run by National Grid Electricity System Operator (NGESO) in winter 2022/23 and 2023/24 are examples of services where domestic households were called upon to reduce their electricity consumption during periods of high demand. Households were usually provided day ahead notice and were remunerated for any demand reduction by an aggregator (usually their energy supplier).

However, assessing demand reduction requires the existence of a baseline which is non-trivial to calculate. We define a baseline to be an estimate of the energy consumption in the absence of any demand response instruction or flexibility service. For this reason, a baseline can also be thought of as a counterfactual. The suitable choice of baseline is closely linked to the flexibility service — see Centre for Net Zero’s (CNZ) recent white paper on baselining. Once a baseline is calculated, the flexibility provided is simply the difference between the baseline and the actual consumption.

Why household baselining is hard but (sometimes) necessary

The electricity demand of individual households is generally very variable (fig.3, left). Not only does one household differ from another due to the appliances in that household, but also due to how occupants interact with said appliances. For example, the general pattern of a household with overnight storage heaters looks very different to the general pattern of a household with gas central heating.

However, households deviate from their general patterns all the time; for example, people might choose to cook an hour earlier or later due to changes in their day-to-day lives. Less predictably, there could be sharper deviations from general patterns, for example, due to a change in household occupancy, people going on holidays, etc.

Fig.3 (left) shows variability in load at differing levels of aggregation and (right) examples of general patterns of household load profiles.. Ref: Jacob, Maria, Neves, Claudia and Greetham, Danica (2020)

The general pattern which may have daily, weekly and/or seasonal cycles (fig.3, right) can be seen when aggregated. Thus aggregated consumption is much less variable and are thus easier to predict. However, use cases exist, such as remunerating individual households, where forecasting unaggregated demand is unavoidable. This use case is where CNZ focused our first work on baselining.

CNZ work

Our analysis aim to understand good principles for baselining residential unaggregated demand, for the purpose of remuneration. Furthermore, the analysis gives an indication of how various algorithms perform in different scenarios:

  • how do different simple recent history baselining algorithms (from various energy markets) perform when applied to domestic consumption?
  • how does the accuracy change with time of year, type of day, time of day, and LCT ownership?
  • How do more complex (machine learning) algorithms stack up against the simpler ones?

The full results from this analysis can be found on the CNZ website.

Emerging findings

The emerging findings from this work point to some interesting characteristics for baselining household consumption:

  • Increasing the amount of historical data used can improve the accuracy but puts additional constraints on the amount of good quality data required. Thus, there is a trade-off between accuracy and barrier to calculation. Generally, using two weeks of data was found to be sufficient for reasonable accuracy.
  • There are many choices of baselines from various energy markets. When applied to household demand, they perform very similarly. However, the accuracy of the forecast is more strongly related to the time of day and time of year for which the forecast is created, the type of appliances and LCTs that are in households, etc.
  • In general, for remuneration, simple recent history baselines outperformed ML algorithms at the unaggregated level. While additional improvements are possible for ML algorithms, the added complexity both in terms of interpretability and implementation may not be worth the small gains in accuracy.
  • There are some types of households where using ML algorithms may outperform the simpler baselines, e.g. those with batteries and heat pumps. While unexplored, it may be possible to gain even more in accuracy when technology specific features are used as inputs for these complex algorithms.

Making ML algorithms generalisable

To ensure that the results from the ML algorithms applied are more generalisable, some considerations were made.

Firstly, two years worth of smart meter data (2021 and 2022) were used. Models were trained and validated on 2021 data and predictions were made on 2022 data.

Secondly, models were trained and validated on 15% of (randomly sampled) households. This is the in-sample households whereas the other 85% constitute the out-of-sample households. Predictions were made on both but error metrics were calculated for each separately to monitor whether the model performed equally well on previously unseen households.

Thirdly, we wanted the models to be trained in a similar way to how they might be used if productionised. As an example, to calculate a baseline for a day in February 2022, we used the model trained on data including and up to January 2022 to make predictions on both the in-sample households and the out-of-sample households.

To assess whether this approach remained consistent over the course of a year, we did some back-testing using the in-sample households and 2021 data: we trained the model on January 2021, and validated on February 2021, then we retrained the model on all data up to and including February 2021 and validated on March 2021, then we trained on all data up to and including March 2021 and validated on April 2021, and so on. Thus we expect the model to be fine-tuned every month but monitored for drift and other bigger decay on an annual basis.

Next steps

The above outlines the first phase of the baselining work but there is, as always, plenty of scope for improvement and further work.

Arguably, the most important next step is to recreate this analysis on a set of households that are representative of the UK population. The analysis to date samples households from Octopus customers in 2021 and 2022 which is potentially unrepresentative of the average UK households (in terms of LCT ownership, tariff types etc). The Octopus customer base may already have changed as a result of acquisition of Bulb and Shell customers. Any future analysis should be recreated on a set of households that are representative of various archetypes in the national population, not just one provider.

Another valuable area of research is to compare the recent history baselines considered in our analysis to control group baselines. In a randomised controlled trial, the treatment group is invited to participate in flexibility services whereas the control group is not. If randomised well, the control group provides an ideal counterfactual/baseline. However, establishing control groups is not always possible or feasible in practice (as discussed in the baselining white paper). Using data from previous randomised controlled trials, we could compare accuracy from control group baselines to recent history baselines to assess whether recent history baselines provide good enough estimates for the counterfactual.

Of course, improving the ML algorithms by considering different inputs, training schedules, and additional feature engineering is also a potential area for progression. Focusing this for households with automation and LCTs may add the most value.

While this is not an exhaustive list, baselining is one of the major challenges in unlocking flexibility as it is central to understanding and valuing the amount of flexibility provided. Thus creating a survey of approaches, and establishing under which scenarios they’re best applied, is a natural gap the literature should aim to fill.

--

--