Understanding Energy Consumption for Appliances

Mayur Sand
Analytics Vidhya
Published in
14 min readApr 22, 2019

In this time of global uncertainty world needs energy and in increasing quantities to support economic and social progress and build a better quality of life, in particular in developing countries. But even in today’s time there are many places especially in developing world where there are outages. These outages are primary because of excess load consumed by appliances at home. Heating and cooling appliances takes most power in house. In this project we will be analysing the appliance usage in the house gathered via home sensors. All readings are taken at 10 mins intervals for 4.5 months . The goal is to predict energy consumption by appliances .

In the age of smart homes, ability to predict energy consumption can not only save money for end user but can also help in generating money for user by giving excess energy back to Grid (in case of solar panels usage). In this case regression analysis will be used to predict Appliance energy usage based on data collected from various sensors. My code is available on github

Problem Statement

We should predict Appliance energy consumption for a house based on factors like temperature, humidity & pressure . In order to achieve this, we need to develop a supervised learning model using regression algorithms. Regression algorithms are used as data consist of continuous features and there are no identification of appliances in dataset

The Data

The date sets can be downloaded from Kaggle .

There are 29 features to describe appliances energy use :

1. date : time year-month-day hour:minute:second

2. lights : energy use of light fixtures in the house in Wh

3. T1 : Temperature in kitchen area, in Celsius

4. T2 : Temperature in living room area, in Celsius

5. T3 : Temperature in laundry room area

6. T4 : Temperature in office room, in Celsius

7. T5 : Temperature in bathroom, in Celsius

8. T6 : Temperature outside the building (north side), in Celsius

9. T7 : Temperature in ironing room, in Celsius

10.T8 : Temperature in teenager room 2, in Celsius

11. T9 : Temperature in parents’ room, in Celsius

12. T_out : Temperature outside (from Chievres weather station), in Celsius

13. Tdewpoint : (from Chievres weather station), °C

14. RH_1 : Humidity in kitchen area, in %

15. RH_2 : Humidity in living room area, in %

16. RH_3 : Humidity in laundry room area, in %

17. RH_4 : Humidity in office room, in %

18. RH_5 : Humidity in bathroom, in %

19. RH_6 : Humidity outside the building (north side), in %

20. RH_7 : Humidity in ironing room, in %

21. RH_8 : Humidity in teenager room 2, in %

22. RH_9 : Humidity in parents’ room, in %

23. RH_out :Humidity outside (from Chievres weather station), in %

24. Pressure : (from Chievres weather station), in mm Hg

25. Wind speed: (from Chievres weather station), in m/s

26. Visibility :(from Chievres weather station), in km

27. Rv1 :Random variable 1, non-dimensional

28. Rv2 :Random variable 2, non-dimensional

29. Appliances : Total energy used by appliances, in Wh

Appliance Consumption Exploration


import numpy as np # linear algebra
import pandas as pd # data processing
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing, model_selection, metrics
data = pd.read_csv("../input/KAG_energydata_complete.csv")
data.head()

The dataset was collected by sensors placed inside the house and outside readings came from the nearby weather station. The main attributes are temperature, humidity and pressure readings. Each observation measures electricity in a 10-minute interval. The temperatures and humidity have been averaged for 10-minute intervals.

Independent variables : 28(11 temperature, 10 humidity, 1 pressure, 2 randoms)

Dependent variable : 1 (Appliances)

Key Observations :

1. Date column is only used for understanding the consumption vs date time behavior and given this is not a time series problem it was removed . I added one more column temporarily (WEEKDAY)which focuses on if a day was weekday or weekend in order to check the difference in appliance consumption

2. Light column was also removed as the are the reading of submeter and we are not focusing on appliance specific reading

3. Number of Independent variables at this stage — 26

4. Number of Dependent variable at this stage — 1

5. Total number of rows — 19735

6. The data set will be split 75–25 % between train & test.

7. Total # of rows in training set — 14801

8. Total # of rows in test set — 4934

9. All the features have numerical values. There are no categorical or ordinal features.

10.Number of missing values & null values = 0

Descriptive Statistics :

Temperature columns

Humidity columns

Weather columns

Appliance column (Dependent variable)

Feature ranges

1. Temperature : -6 to 30 deg

2. Humidity : 1 to 100 %

3. Windspeed : 0 to 14 m/s

4. Visibility : 1 to 66 km

5. Pressure : 729 to 772 mm Hg

6. Appliance Energy Usage : 10 to 1080 Wh

Data Visualization

Independent Variable distribution

RH_6 , RH_out , Visibility , Windspeed irregular distribution

Dependent Variable distribution

Observations based on distribution plot

1. All humidity values except RH_6 and RH_out follow a Normal distribution, i.e., all the readings from sensors inside the home are from a Normal distribution.

2. Similarly, all temperature readings follow a Normal distribution except for T9.

3. Out of the remaining columns, we can see that Visibility, Windspeed and Appliances are skewed.

4. The random variables rv1 and rv2 have more or less the same values for all the recordings.

5. The output variable Appliances has most values less than 200Wh, showing that high energy consumption cases are very low.

6. No column has a distribution like the target variable Appliances.

Hence, there are no feature independent feature with a linear relationship with the target.

Co-relation plot

Observations based on correlation plot

1. Temperature — All the temperature variables from T1-T9 and T_out have positive correlation with the target Appliances . For the indoor temperatures, the correlations are high as expected, since the ventilation is driven by the HRV unit and minimizes air tempera-ture differences between rooms. Four columns have a high degree of correlation with T9 — T3,T5,T7,T8 also T6 & T_Out has high correlation (both temperatures from outside) . Hence T6 & T9 can be removed from training set as information provided by them can be provided by other fields.

2. Weather attributes — Visibility, Tdewpoint, Press_mm_hg have low correlation values

3. Humidity — There are no significantly high correlation cases (> 0.9) for humidity sensors.

4. Random variables have no role to play

5. The random variables rv1, rv2 and Visibility, Tdewpoint, Press_mm_hg have low correlation with the target variable.

Due to above conclusions , I have dropped rv1, rv2, Visibility, T6,T9.

Number of Input Variables — 21 (reduced from 26)

Modelling Techniques & Benchmarks

This is a Regression problem. Regression analysis is a form of predictive modelling technique which investigates the relationship between a dependent (target) and independent variable (s) (predictor). The regression methods used are

1.Linear Models :

Linear Regression

In linear regression we wish to fit a function in this

Form Ŷ = β0+β1X1+β2X2+β3X3 where X is the vector of features and β0, β1 ,

β2, β3 are the coefficients we wish to learn. It updates β at every step by

reducing the loss function as much as possible. As modification to Linear regression model, we can apply Regularization techniques to penalize the coefficient values of the features, since higher values generally tend towards overfitting and loss of generalization.

Ridge Regression

This loss function includes two elements. Sum of distances between each prediction and its ground truth. The second element sums over squared β values and multiplies it by another parameter λ. The reason for doing that is to “punish” the loss function for high values of the coefficients β.

It enforces the βcoefficients to be lower, but it does not enforce them to be zero. That is, it will not get rid of irrelevant features but rather minimize their impact on the trained model.

Lasso Regression

The only difference from Ridge regression is that the regularization term is in absolute value. But this difference has a huge impact on the trade-off. Lasso method overcomes the disadvantage of Ridge regression by not only punishing high values of the coefficients β but actually setting them to zero if they are not relevant. Therefore, we might end up with fewer features included in the model than we started with, which is a huge advantage.

2.Support Vector Machine

Support vector regression

The Support Vector Regression (SVR) uses the same principles as the SVM for classification . In the case of regression, a margin of tolerance (epsilon) is set in approximation to the SVM which would have already requested from the problem.

3.Nearest neighbour Regressor

KNeighborsRegressor

KNeighborsRegressor retrieve some k neighbors of query objects, and make predictions based on these neighbors . It computes the mean of the nearest neighbor labels.

4.Tree based Regression models

We divide the predictor space — that is, the set of possible values for X1, . . . , Xp — into J distinct and non-overlapping regions, R1, . . . , RJ . For every observation that falls into the region Rj , we make the same prediction, which is simply the mean of the response values for the training observations in Rj .

Our goal is to find boxes R1, . . . , RJ that minimize the RSS given by

RSS = X J j=1 X i∈Rj (yi − yˆRj ) 2 ,

where yˆRj is the mean response for the training observations within the jth box. Tree based models are less affected by outliers as compared to Linear models. Given there isn’t a linear relation between any input and the target variable, so it is likely that Trees will work better than Linear models.

Ensemble methods

It combines several decision trees to produce better predictive performance than utilising a single decision tree. The main principle behind the ensemble model is that a group of weak learners come together to form a strong learner.

- Bagging : Bagging (Bootstrap Aggregation) is used when our goal is to reduce the variance of a decision tree. Here idea is to create several subsets of data from training sample chosen randomly with replacement. Now, each collection of subset data is used to train their decision trees. Average of all the predictions from different trees are used which is more robust than a single decision tree.& Boosting

- Boosting : Boosting is another ensemble technique to create a collection of predictors. In this technique, learners are learned sequentially with early learners fitting simple models to the data and then analyzing data for errors. In other words, we fit consecutive trees (random sample) and at every step, the goal is to solve for net error from the prior tree.

Random Forests

A Random Forest is an ensemble technique capable of performing both regression tasks with the use of multiple decision trees and a technique called bagging. and works well on high dimensional data

Gradient Boosting Machines

Gradient Boosting is an extension over boosting method. It uses gradient descent algorithm which can optimize any differentiable loss function. An ensemble of trees are built one by one and individual trees are summed sequentially. Next tree tries to recover the loss . Gradient Boosting= Gradient Descent + Boosting.

Extremely Randomized trees

The Extra-Trees algorithm builds an ensemble of unpruned decision or regression trees according to the classical top-down procedure. It splits nodes by choosing cut-points fully at random and that it uses the whole learning sample to grow the trees.

5.Neural Networks

A multilayer perceptron (MLP) is a deep, artificial neural network. It is composed of more than one perceptron. They are composed of an input layer to receive the signal, an output layer that makes a decision or prediction about the input, and in between those two, an arbitrary number of hidden layers that are the true computational engine of the MLP. MLPs with one hidden layer are capable of approximating any continuous function.

Benchmark

The benchmark is the R2 score of the Gradient Boosting technique used by the author in his original research paper. Following are the benchmark numbers

1. R2 score on training data: 57%

2. R2 score on test data: 97%

3. RMSE on training data = 17.56

4. RMSE on test data = 66.65

Data Preprocessing & Implementation

Data Scaling

The feature set has data in varying ranges . Temperature(-6 to 30) , Humidity (1–100) , Windspeed (0 to 14), Visibility (1 to 66) Pressure (729–772) and Application Energy Usage(10–1080). Due to different ranges of features, it is possible that some features will dominate the Regression algorithm. To avoid this situation, all features need to be scaled.Thus, the data was scaled to 0 mean and unit variance using the StandardScaler class in sklearn.preprocessing module.

Implementation

Below mentioned scikit-learn & xgboost libaray’s were used to test each regression model:

  1. sklearn.linear_model.Ridge
  2. sklearn.linear_model.Lasso
  3. sklearn.ensemble.RandomForestRegressor
  4. sklearn.ensemble.GradientBoostingRegressor
  5. sklearn.ensemble.ExtraTreesRegressor
  6. import xgboost
  7. sklearn_neighbors
  8. sklearn.svm.SVRPipeline
  9. sklearn.neural_network. MLPRegressor

Pipeline :

1. Store all the algorithm’s in a list and Iterate over the list

2. The regressor’s random_state was initialized with a seed so that the results are the same every time other parameters were default.

3. the regressor was made to fit on the test & training data

4. The properties of the regressor , Name, timining & score for training and testing set were stored in a dictionary variable as key-value pairs.

5. The dictionary was appended to a global list of all dictionaries which is converted into dataframe

Result :

As observed from results, ExtraTreesRegressor performs better than all other regressors in terms of all metrics except for Training time

Feature Selection for Improvement

Extra Trees Regressor performed the best with default parameters. I used grid search cross validation using the GridSearchCV function of the sklearn.model_selection library. The parameters which were tuned :

1. n_estimators: The number of trees to be used

2. max_features: The number of features tob e considered at each split

3. max_depth : The maximum depth of the tree , If no param is provided then splitting will continue till all leaves are pure or contain less the min_samples_split specified

The R2 score improved by 10% (0.57 to 0.63) post ususage parameters suggested by GridSearchCV

Challenges & Learning gained during project

1. Feature scaling is very important for regressions models , I initially tried without it and the results were not good . On Kaggle this is suggested by all users.

2. Using seed value helped in reproducing results for algortithms . Without this value the results were different each time.

3. It is very important to check the intercorrelation between all the variables in order to remove the redundant features with high correlation values.

4. While scaling data , it is useful to maintain separate copies of dataframe which can be created using index and column names of original dataframe

5. The pipeline of adding algorithms should be easy to manage

6. Seaborn and pyplot are good libraries to plot various properties of dataframe

7. For performing Exhaustive search or Random search in the hyperparameter space for tuning the model, always parallelize the process since there are a lot of models with different configurations to be fitted. (Set n_jobs parameter with the value -1 to utilize all CPUs)

8. One effective way to check the robustness of the model is to fit it on a reduced feature space in case of high dimensional data. Select the first ‘k’ (usually >= 3) key features for this task.

Results

Model Evaluation & Valuation

Features of the untuned model:

1. n_estimators : 10

2. max_features : auto

3. max_depth : None

Features of best model after hyper parameter tuning:

1. n_estimators : 200

2. max_features: ‘sqrt’

3. max_depth: 80

The best model is trained on reduced feature space having only 5 highest ranked features in terms of importance instead of 22 features.

Robustness check:

The best model is trained on reduced feature space having only 5 highest ranked features in terms of importance instead of 22 features.

· R2 score on test data model with reduced features = 0.47.

· R2 score on test data for untuned model = 0.57.

· Difference = 0.10

· RMSE score on test data model with reduced features = 0.72

· RMSE score on test data for untuned model = 0.65

· Difference = 0.07

Therefore, we can see that even though the feature space is reduced drastically (by more than 75%), the relative loss in performance on test data is less.

Benchmark comparison

Final Model

Training R2 score — 1.0

Testing R2 score —0.63

RMSE on test data — 0.60

Benchmark Model

Training R2 score — 0.97

Testing R2 score — 0.57

RMSE on test data — 0.66

There has improvement of 10.53% in the R2 score for testing set , with more data and feature engineering this can be improved further.

Conclusions

According to best fit model , the 5 most and least important features

The top 3 important features are humidity attributes, which leads to the conclusion that humidity affects power consumption more than temperature. Windspeed is least important as the speed of wind doesn’t affect power consumption inside the house. So controlling humidity inside the house may lead to energy savings.

Reflections

This project can be summarized as

1. Looking for Energy related dataset on UCI Machine Learning repository and Kaggle where in benchmark numbers are availiable

2. Deciding between Classification and Regression problems.

3. Visualized the data, did preprocessing by learning from other regression contests from Kaggle.

4. Preprocessing the data and feature selection.Look for correlation between features

5. Deciding the regression algorithms to be used to solve the problem.

6. Using GridSearchCV instead of RandomizedSearchCV to create benchmark model.

7. Applying selected algorithms and visualizing the results.

8. Hyper parameter tuning for the best algorithm and reporting the test score of best model.

9. Discuss importance of selected features and check the robustness of model..

10.Comparing my tuned model against the author’s benchmark result .

Improvements

1. Perform aggressive feature engineering

2. Look for classification scenario in the dataset and explore the problems

3. Modifying parameters of the Grid Search parameter space.

a. Add more parameters like min_samples_split, min_inpurity_decrease etc.

b. max_depth can have more values

References:

Related academic research and earlier work:

http://dx.doi.org/10.1016/j.enbuild.2017.01.083

If you have any questions , let me know happy to help. Follow me on Medium or LinkedIn if you want to receive updates on my blog posts!

--

--