The missing message

How are device heartbeats different from website clicks?

Gonçalo B.
2 min readDec 14, 2018

A terminal out in the field will go quiet while you read this.

However, an ordinary time series database plugged to your data stream won’t notice — they’re in the business of COUNT()ing things that happened in an SQL-esque way, not the things that didn’t. That’s why capturing offline machines is the primary reason why we, the engineering team at DevicePilot, had to build a query engine of our own. In the name of enabling the heroes changing the world for the better (our customers!), one connected X at a time.

Enter uptime percentage. You provide a service that only truly serves when the terminal is powered — not running an infinite loop, responsive to user interaction and connected to your backend. Then uptime is the metric your team’s worth should be pegged to — in the last 7 days, how much time did your devices spend online, normalized to the total amount of device hours in that period?

The ‘aha!’ moment around here happened when we realized that in order to answer these kind of operational questions, the query engine had to not consider messages received from a device as its first class citizen. The gap in-between messages becomes a lot more important than the messages themselves. A device being online or offline is defined by the elapsed time between two consecutive messages. If the gap is greater than your expected heartbeat period then — from a service delivery standpoint — your device is offline and the uptime percentage figure reflects that by going down.

And it doesn’t stop there. Our customers are pushing the boundaries of what’s possible with IoT today and we end up getting the heat. Soon after offering uptime metrics they asked for clustering — if you have multiple devices in the same physical location then the question goes up one level. At any given time, what’s the probability you can deliver your core service at a particular site? In that case you no longer care about the availability of a single terminal, but instead you want to know if at least one of the co-located terminals is ready for the customer.

What will the next questions be? We’re in for the ride and excited to see our customers becoming more mature providers of connected services at scale while defining the right business enhancing analytics.

--

--