I.o.T Project: Thermal Camera’s and Reinforcement Learning for more Eco-Friendly Houses (Part I)
In this series, I take you on my journey through various I.o.T. projects. First, I start with some theory in the first part, and slowly go through the challenges in the following posts.
In this first project, I bring my take on combining reinforcement learning and cheap Thermal Camera’s.
Introduction
Thermal camera’s are often used in final stages of construction work to locate sections with large energy losses. These professional camera’s tend also to be relatively expensive with prices going up to $ 20,000.
With the advent of the Internet of Things movement, one has to be tempted to ask two questions.
- “What if” we replaced these by much cheaper sensors?
- What can we do if we streamed the data continuously?
In this series, I propose to combine cheap thermal camera’s with reinforcement learning to design a more Eco-friendly thermostat.
Using thermal imaging, our smart thermostat controls the heat flow within the room more efficiently, consuming less energy.
What is Reinforcement Learning?
The idea of reinforcement learning is to estimate a decision function, i.e. policy, which given some state “x”, tells us which action “a” we should take to maximize our benefit, i.e. reward.
How we define the policy, states, actions and reward depends on the problem. The literature on this topic is pretty involved, but I hope the following computations are concise enough.
The field of reinforcement learning often brings us to the topic of “Q-learning”, which estimate a so-called Q-function:
This function maps each combinations of states and action to some value. This value is related to the reward that this action will bring and a consecutive benefit of the change of the state due to the action.
Often, then the policy “π” is defined as:
Hence, if we are in a state “x”, we choose the action “a” with the highest “Q(x,.)”-value.
2. Getting back to our problem…
Borrowing the jargon from reinforcement learning, we have the following set-up:
- States are given by the observed thermal image
- Actions are the act of “heating up”,”cooling down” or “leaving unchanged” the heating/cooling devices in the room
- Reward is the cost function which should reflect our wish to keep the temperature steady at a certain temperature while wasting the least amount of energy as possible.
Before going all fancy with deep Q-learning, let us look at the problem in its most rudimentary form, i.e. a simple thermostat.
A thermostat continuously measures the temperature of the room. If the temperature is below the target, the temperature of the heater is raised.
For the sake of simplicity, let us assume that our device works both as a heat and cooler.
A Q-matrix of our thermostat could then be written as follows:
The policy defined above, under this Q-function chooses the actions of the thermostat as we described. If the current temperature of the room is above the optimal temperature, the device cools down, and heats up in the opposite case.
Intuitively, we want a Q-function with a similar behavior. Hence, we might just as well use this one partially,
We can start thinking about it later. First, we need to estimate the influence of our actions on the state. This means that some thermodynamics has to come to the playground.
If we neglect the flow in the z-direction, we can restrict ourselves to the two-dimensional heat equation,
We then consider the devices and other uncontrollable heat-sources/drains, such as windows, as stationary boundary conditions.
3. Now Enters the Deep Q-learning
There exists plenty of problems in reinforcement learning when deep-learning is probably a bad idea due to the instability introduced by the non-linearity. However, in the case of unstructured data, deep-learning is handily extracts valuable features and is showing itself to outperform the more conventional techniques.
In this project, we build our Q-matrix from a series of convolutional layer and next apply the Deep Q-learning algorithm from the Atari Paper [1].
The algorithm is described below,
To illustrate our work-process, let us assume that we create a 16-by-16 thermal image of our room from the output of our camera(‘s).
In the room, we place a window and radiator, which serve as heat drain and source respectively.
This input of the Q-function is next processed by a series of neural layers given by the code below:
This Q-function is then used to approximate the reward of each action on a state at some point in time as given by the snippet of code below:
Now, it only rests us to propose a good reward function, which outperform the the standard thermostat!
4. Next Steps …
In the next part, we will try and look for various reward function, which outperform the the standard thermostat!
The heat-flow from our devices is a first interesting proposal. More precisely the heat-flow through a device i during a period of time can be written as,
Let us then write the following reward function “r(x,a)”,
with,
being monotonically increasing
This has a nice interpretation. The weight factor between the Q-function resulting from this reward function and the Q-function of the thermostat can be seen as the maximum acceptable heat-flow. If this heat-flow is exceeded, the device behaves like a regular thermostat. If there is a more efficient heat-flow, then the behavior of the thermostat is overridden and a more Eco-friendly decision is taken.
The code of this project in progress is currently available here. See you in part 2!
More Awesome Stuff, I wrote!
@ Tuning Hyperparameters (part I): SuccessiveHalving
@ Custom Optimizer in TensorFlow
@ Regression Prediction Intervals with XGBOOST
References:
- “Playing Atari with Deep Reinforcement Learning”, Riedmiller, M. et.al, Deep Mind Technologies.
- github: https://github.com/benoitdescamps/Thermal_sensors