By William Martin
Disaggregation is the process of taking an aggregate signal, such as household energy sampled at a regular interval and decomposing it into its individual components, such as kettle or toaster usage. Energy disaggregation is a complex task as a single household may have tens of appliances on concurrently, each operating at a distinct power level; or, more complex, non-distinct power levels. To make matters easier, many household appliances have distinctive operating components such as motors, heating elements and compressors, and for these, it is possible to train classifiers to detect their operation signals.
Figure 1 demonstrates how the aggregate electric power load might look over a period of 40 minutes, with a refrigerator and an oven both operating concurrently. This also shows how the signal is often complex, and appliances always operate on top of an underlying background signature caused by lights and appliances on standby.
In this article I review some of the most popular approaches tackling the problem of tracking a house state. For the purpose of this article, this is defined as: a means of modelling individual appliances’ contributions to an aggregate electric power signal over time.
Finite State Machines
A popular approach proposed by Hart ’92 was to break energy disaggregation into stages of event detection and subsequent on-off edge matching and state tracking. Hart’s approach was to detect on-off appliance switching (onsets or offsets) and track the household with a Finite State Machine (FSM), attempting to minimise the difference between the FSM energy and the aggregate house load.
With this approach, we attempt to minimise the difference at any given time between the aggregate power and the sum of the states of our appliances. For example, a 3kW kettle being on at the same time as a 1kW toaster comes closer to explaining a 4kW aggregate load than just the kettle alone. A change in the aggregate load from 4kW to 1kW would best be explained by the kettle turning off, and this is encapsulated in the FSM model.
Hidden Markov Models
A Hidden Markov Model (HMM) builds on the FSM by introducing the concept that a state for a variable in the model depends on its previous state, as well as potentially the states of other variables in the model. This means that a kettle state changing from off to on depends on whether it was previously on or off (this seems rather obvious, but it’s possible to make predictions without encapsulating this information in the model), in addition to depending on whether the toaster is on or off, for example.
In general, the hidden structure of HMMs can be arbitrarily complex and therefore intractable to solve, however it is possible to make some reasonable assumptions to simplify things. One common assumption is that appliances are used independently of one another. This assumption is counter-intuitive (we typically use our dryer after our washing machines), but makes computations tractable. This is called a Factorial HMM (FHMM). This model restricts the HMM to a set of known states, which is an approximation to allow for efficient computation.
In fact, most NILM approaches use a Factorial Hidden Semi-Markov Model (FSHMM), which also incorporates time into the dependency graph (semi because states do not only depend on the previous time step but can have longer-range dependencies), allowing for the duration in a state to affect its likelihood of a transition. This leads to the usage of power and state duration distributions to describe the likelihood of appliances transitioning on or off at any given power level and for any duration. Such tracking approaches can ‘change their mind’, for example: a 3kW onset is most likely to be described as a kettle, but after a certain duration has elapsed with no change in the aggregate load, i.e. 5 minutes, its chances of being a kettle rapidly diminish, and it might now best be explained as an oven, or low power electric shower.
Clustering approaches take the approach of associating groups of similarly sized transitions of onsets or offsets and then labelling them. The advantage of this approach is that it is computationally efficient, and can be trained in an unlabelled fashion on the fly. For example, there may be tight clusters of 3kW onsets and 3kW offsets labelled for the kettle. Now suppose we see a set of 2kW onsets and offsets in rapid succession: there isn’t a labelled cluster for that, so one is created and called ‘Unknown Appliance 1’. Future transitions of this size are now labelled as such, until such time as a human can relabel this cluster.
Zhao et al. took the approach of decomposing an aggregate signal (P) into delta edges (delta P). These edges are then clustered by magnitude, onset clusters are associated with similarly sized offset clusters, and the pair is labelled as an appliance.
Barsim et al. took the approach of assuming that appliance usage is surrounded by a so-called ground state. This ground state (the identically sized power level before and after the appliance usage) enables us to detect an individual appliance by a single usage label, and easily pair onsets with offsets.
Convolutional Neural Network approaches take a time-boxed series of the input power / current / voltage as input, and utilise a directed graph of layers of neurons to perform convolutions (transformation), pooling (aggregation) and non-linear activations for classification downstream. Recurrent approaches such as the LSTM retain some memory of past inputs in order to better inform classification.
Kelly et al. ’15 experimented with convolution approaches and recurrent approaches, finding that LSTMs (a common recurrent neural network approach that retains some memory of past inputs) were able to equal the performance of HMMs for the purpose of energy disaggregation. These approaches have been built upon by a number of authors. For brevity, we will not list them all here, but point interested readers to “A Survey on Non-Intrusive Load Monitoring Methodologies and Techniques for Energy Disaggregation Problem”, Faustine et al. 2017.
Lange et al.’16 and Barsim et al.’16 utilise binary ensembles of neural networks in order to provide state of the art disaggregation performance.
Martinez et al.’18 uses a neural network architecture that feeds the pooled (aggregated) embedding of a CNN directly into multiple LSTM layers to perform disaggregation. This is similar to several other approaches that feed this embedding into HMMs, simplifying training into a single stage.
This approach attempts to represent the input data with a sparse vector from a (high dimensional) set of over-complete classes. This approach is modelled after brain neural activity and can be seen as analogous as de-convolved CNN filters (Learning Deconvolution Network for Semantic Segmentation, Noh et al, 2015).
This thorough review is just one of the many pieces of research being done in the NILM field, some of which can be found here http://wiki.nilm.eu/. Each year, researchers in the field gather to collaborate at the NILM Workshop. The aim of this workshop is to bring together researchers that are working on the topic of energy disaggregation in both academia and industry. If you’d like to find out more about this event visit http://www.nilm.eu/.