Handling Concept Drift at the Edge of the Sensor Network
A technique to deal with re-programming of formulae in edge sensors without OTA firmware updates.
Consider a scenario where you are using an electric motor in your workshop. You have installed a few sensors on the outer body of this motor, and these sensors continuously send you the data over Wi-Fi. Then you have an elaborate setup on the cloud where your intelligent application program analyses those parameters and determines your motor’s health status.
This scenario, of course, takes into account the trained pattern of motor vibration and current consumption by the motor during the learning phase. Moreover, so long nothing changes; this pattern recognition works like a charm. That is what a good machine learning would entail.
However, this data, generated by an electric motor, can change over time. This change in data can result in poor analytical results, which otherwise have assumed a static relationship between various parameters and motor health.
The change in data can occur due to various real-life scenarios, such as changes in operating load conditions, ageing of ball bearings or foundation on which the motor is installed, environmental conditions, and several other factors.
More so, this problem of change in data over time and thereby affecting statically programmed (or assumed) underlying relationships is a common occurrence for several other real-life scenarios of machine learning. The technical term, in the field of machine learning, for this scenario is “concept drift”.
A concept in “concept drift” refers to the unknown and hidden relationship between inputs and output variables.
For example, one concept in weather data may be the season that is not explicitly specified in temperature data but may influence temperature data. Another example may be customer purchasing behaviour over time that may be influenced by the strength of the economy, where the strength of the economy is not explicitly specified in the data. These elements are also called a “hidden context”.
Why is this a problem?
For an utterly static use case, this (concept drift) is not a problem at all. However, in several use cases, the relationship between input parameters (or features) and output characteristics change over time. If your machine learning model did assume data patterns to be static, there would be a problem in the future.
It still is relatively easier to handle if you maintain these relationships and formulae on the cloud. You can easily update new relationship formula in the cloud application, and everything will be regular again.
However, if your architecture is edge compute dependent, such that you push learned models to the edge sensors for faster responses, then this (new) learning must be transferred occasionally. Interestingly, in the cases of industrial implementation, edge computing utilisation is highly recommended and is quite common.
The challenge here is, how do you update these formulae in low-cost sensors that do not have vast memory advantage or where the over the air firmware (OTA) updates are not feasible. How can you send & update, only that essential formula to one sensor device?
Challenges in dealing with concept drift
The first obvious challenge is to detect when this drift occurs, and there I suggest one of two ways to handle that.
- When you finalise a model for deployment, record its baseline performance parameters such as accuracy, skill level, and others. When you deploy the model, periodically monitor these parameters for change. If you see the difference in parameters is significant, it could be indicative of potential concept drift, and you should take action to fix it.
- The other way to handle it is, assume drift will occur and therefore, periodically update the model in the cloud as well as update the sensors or edge network. The challenge, however, is to handle the edge sensor updates without causing downtime.
While the first challenge is relatively easy to manage, the second poses a technical problem, which is why it becomes essential to have an in-built sensor capability which accepts model (formulae) updates without needing to update the entire firmware. So this is where the dynamic evaluation algorithm would come in handy.
Dynamic evaluation algorithm
The fundamental algorithm was first introduced in 1954 and was first used in desktop calculators by HP during 1963. However, now, almost all calculators deploy this methodology to perform user input calculations.
The algorithm heavily relies on a specific type of representation of the formula, and it is known as Postfix Notation or Reverse Polish Notation (RPN).
In Postfix Notation the operators follow their operands; for instance, to add 4 and 6, one would write 4 6 + rather than 4 + 6. If there are multiple operations, the operator is given immediately after its second operand; so the expression is written 1–4 + 6 in conventional notation — would be written as 1 4–6 + in Postfix. That means, first subtract 4 from 1 and then add 6 to that.
The advantage of Postfix is — it prevents the need for parentheses. Usually, standard notations require brackets, and this way Postfix removes ambiguity from the formulae. For instance, we can write 1–4 * 6 as 1 — (4 * 6), and it is quite different from (1–4) * 6. In Postfix, we can write the former as 1 4 6 * -, which unambiguously means 1 (4 6 *) — whereas we can write the latter one 1 4–6 * or 6 1 4 — *.
In either case, you would see that operators with the higher priority would come on far right (the BODMAS rule we learned in school days). As the notation result is always context-free, once we convert an equation in Postfix Notation, it becomes easier for a computer to evaluate the same using outside-in evaluation sequence.
Using postfix in a sensor on the edge
While using and converting standard formulae in Postfix Notation on paper would be more comfortable putting them in the software code presents many challenges. A typical method to implement this would be two-stepped — first to construct an abstract syntax tree and then perform simple post-order traversal of that tree.
Once we create the notation, it needs to be evaluated and to carry out the outside-in evaluation sequence; we must use a stack. It means our processor must have sufficient stack space available in RAM. It follows that limited or small RAM results in limited computation ability.
This algorithm is stack and queue dependent and uses a stack for storing functions (aka operators) and a simple queue to hold numbers (aka operands).
The figure below shows the algorithm flowchart. Due to the nature of operations performed in this algorithm, some also call it as a Shunting Yard Algorithm since the process resembles the railroad shunting yard methodology.
With this flow, we can parse a valid equation into a tokenised sequence.
Once we finish the token parsing and formulate appropriate notation sequence, the next step is the evaluation of an answer. We can do it in the following sequential steps:
1. Initialize an empty stack.
2. Scan the Postfix Notation string from left to right.
3. If the token read is operand, push it into the stack and repeat; else if the token read is an operator, that would mean there are at least two operands already present in the stack, continue to step 4.
4. Pop these two operands from stack.
5. Perform the operation as per the operator.
6. Push the results back into the stack.
7. Repeat step 3 to 6 until the whole equation is scanned.
8. Once the string scanning is complete — there would be only one element present in the stack which is the final answer.
For example, an equation like 3 + 2 * 4 / ( 1–5 ) will get tokenized in step 1 as 3 2 4 * 1 5 — / + and will be evaluated as 1 (answer) in evaluation step.
3. Ongoing Management
The edge sensor would need some form of rewritable non-volatile memory (such as EEPROM or SD card), where it can write new and updated equation(s), which then the sensor can subsequently use of for its ongoing operation.
Where else can concept drift occur?
At the beginning of this article, I explained how data could change over time in case of an electrical motor. However, this problem is certainly not limited to one use case. Several other applications are vulnerable to this problem.
In the case of credit card spend tracking and fraud detection algorithms, the user spending pattern can change over time. For a security surveillance application in a public place, the footfall or visitor pattern can show seasonal or permanent change over time. Retail marketing, advertising, health applications are equally prone.
If your sensor network is monitoring server room temperature and humidity, a new cabinet or rack addition can also affect the pattern of change of these factors.
It is unrealistic to expect that data distributions stay stable over a long period. The perfect world assumptions in machine learning do not work in most of the cases due to change in the data over time, and this is a growing problem as the use of these tools and methods increases.
Acknowledging that AI or ML doesn’t sit in a black box and they must evolve continuously, is the key to fixing it.
We can fix them in near real-time environment, regularly, with the proposed algorithm and technique.
- Understanding Concept Drift
- What Is Concept Drift and How to Measure It?
- An overview of concept drift applications
- Learning in the Presence of Concept Drift and Hidden Contexts
- Concept Drift Detection for Streaming Data
- The problem of concept drift: definitions and related work
- Learning under Concept Drift: an Overview
- Handling Concept Drift: Importance, Challenges and Solutions.
About the Author: I am many things packed inside one person: a serial entrepreneur, an award-winning published author, a prolific keynote speaker, a savvy business advisor, and an intense spiritual seeker. I write boldly, talk deeply, and mentor startups, empathetically.