Predict Building Energy consumption using Regression Analysis

Published in

Analytics Vidhya

6 min readDec 28, 2019

Objective:

Provide an approach for building a machine learning model to predict building electrical energy consumption. This prediction may also be used to perform anomaly detection; by finding times when the building energy trend usage is not behaving in the same way it has behaved in the past; under similar conditions.

Other benefits of energy Prediction may include:

Understand changes in a facility’s electricity consumption patterns from one time period to the next (e.g- due to seasonal variations).
Analyze & quantify the effectiveness of energy efficiency strategies (if any) implemented for the building.

Approach:

The approach is based on a white paper written by researchers at the Ernest Orlando Lawrence Berkeley National Laboratory. This article contains detailed information on how the algorithms were derived and how they work.

White paper link:

https://drrc.lbl.gov/sites/default/files/LBNL-4944E.pdf

The algorithm looks at three pieces of data:

(a) The time of day
(b) Outdoor air temperature
(c) Occupancy of the building.

These are the most important variables which tend to collectively influence the building energy usage.

Detailed Information:

A facility’s electric load is usually a function of both temperature and time-of-week. Therefore, both temperature and time-of-week are considered in the regression model.

The model includes two features:

Time of Week Indicator variable (αᵢ)
Piece-wise linear & continuous outdoor air temperature dependence (βⱼ)

Below is an overview of how these variables are utilized for model creation:

Time of Week Indicator variable: Divide a week (Monday-Friday) into bins of 15-minute-intervals e.g., the first interval is from midnight to 12:15 a.m. on Monday morning, the second interval is from 12:15 a.m. to 12:30 a.m.; and so on. A different regression coefficient (αᵢ) for each time-of-week allows each time-of-week to have a different predicted load.
Piece-wise linear & continuous outdoor air temperature dependence: When the outdoor air temperature is high, cooling load tends to increase with temperature. When the outdoor temperature is low, heating load tends to increase as temperature decreases (even when electricity is not used as the heat source electricity will be required to run pumps and fans when the building is heating). For some range of moderate temperatures, the load may be insensitive to temperature because neither cooling nor heating is needed (the temperature is said to be in the “dead-band”). Sometimes the outdoor air temperature may be so high that the cooling capacity cannot achieve the desired indoor temperature set point, at which point the load is at the maximum possible air conditioning load (maxed-out).

Diagram illustrating the above is shown below for reference:

This nonlinear temperature effect can be modeled with a piece-wise linear and continuous temperature dependent load model as follows:

For each facility, we divide the outdoor air temperatures experienced by that facility into six equally-sized temperature intervals. A temperature parameter is assigned to each outdoor air temperature interval.
Example: If the minimum temperature experienced by the facility is 50 deg F and the maximum temperature experienced by the facility is 110 deg F; since we need to divide this range into 6 different & equal sized intervals; the temperature intervals would be 50–60 deg F, 60–70 deg F, 70–80 deg F , 80- 90 deg F, 90–100 deg F, and 100–110 deg F.
A temperature parameter, βj with j = 1…6, is assigned to each outdoor air temperature interval.
To achieve piece-wise linearity and continuity, the outside air temperature at time t (which occurs in time-of-week interval i), T (ti), is broken into six component temperatures, Tc,j(ti) with j = 1…6. Each Tc,j(ti) is multiplied by βj and then summed to determine the temperature-dependent load.
Let Bk (k = 1…5) be the bounds of the temperature intervals. Component temperatures are computed using the following algorithm:

If T (ti) > B1, then Tc,1(ti) = B1. Otherwise, Tc,1(ti) = T (ti) and Tc,m(ti) = 0 for m = 2…6, and algorithm is ended.
For n = 2…4, if T (ti) > Bn, then Tc,n(ti) = Bn − Bn−1. Otherwise, Tc,n(ti) = T (ti) − Bn−1 and Tc,m(ti) = 0 for m = (n + 1)…6, and algorithm is ended.
If T (ti) > B5, then Tc,5(ti) = B5 − B4 and Tc,6(ti) = T (ti) − B5.

Let us assume; the minimum & maximum value of outside air temperature are as given in the previous example (i.e 50 deg F to 110 deg F temperature range is divided into 6 equal intervals; each of size 10 units). In this case, B1=60, B2=70, B3=80,B4=90,B5=100

Now, if the instantaneous outside air temperature [T(ti)] is 87 deg F; then we will distribute this value into the individual temperature components based on above algorithm as shown in Table 1 in below snapshot:

Energy prediction using the above approach:

The building energy consumption is a function of the time of week indicator variable (αᵢ) & the piece-wise linear outdoor air temperature dependent variable (βⱼ).

For occupied mode, occupied load, Lo, is estimated as follows:

Illustration: Let us say αᵢ=300. Also, if we consider the instantaneous outside air temperature value as 87 deg F, the values of temperature components Tc,1 to Tc,6 will be the same as shown in Table 1 above.
Assume, corresponding βⱼ for the individual temperature components are as follows: β1=0.1; β2=0.2; β3=0.3; β4=0.4; β4=0.4; β5=0.5; β6=0.6.

In this case, the instantaneous energy load is predicted as follows:

Lo= 300+[(0.1)(60)+(0.2)(10)+(0.3)(10)+(0.4)(7)+(0.5)(0)+(0.6)(0)]

To predict load when the building is in unoccupied mode, we use a single temperature parameter, βu since we expect most facilities in our data set to be operating at or near the dead-band at night.
Unoccupied load, Lu, is estimated as follows:

A machine learning model can be trained to learn the values of αᵢ & βⱼ above; by training it on sufficient data.

Important criteria for model training, testing & deployment

1. Selecting training data set sample size: Sample size for the data set used to train the model should be sufficiently large, to not cause over fitting & for the model to generalize & fit well to unseen production data (the white paper recommends at least 5 months of data for training the model; to avoid parameter values being overly influenced by stochastic/random variability (noise) in the data).

2. Data Preparation & cleaning: We need to ensure that the data set is reliable for modeling & has been treated for outliers & missing values.

3. Training the model: Once the data set is treated; we will use this data set to train our model. This is the step where the model will estimate the best values (using OLS metric) for the time of week indicator variable & temperature parameters for different ranges of outside air temperature.

4. Model evaluation & tuning on Validation data set: To test how well the load prediction method works; we can have a separate validation data set of historical energy usage data & plot predictions on top of actual load data.
Example of such an evaluation shown in below figure:

5. Model deployment: Once satisfactory accuracy (metric: OLS- Ordinary Least Squares) is obtained on the validation data set, we test the model on a test data set (unseen data) & check how well the model generalizes on the production data.