IoT is easy … they said
How we tried to make sense of wearable data …
Some context …
We operate a corporate wellness platform that connects wearable devices of different providers to drive employee engagement towards health.
We started with a simple requirement …
Capture when a certain number of steps were taken by a user, with a certain level of granularity, with the ability to identify which device was responsible for capturing those steps.
We had another what appeared to be obvious requirement. Ensure that once steps were captured, they cannot be changed after the fact. Our platform drives a large set of rules that could mean the difference between winning or losing an award, or reaching a milestone — So we needed a guarantee of immutability.
Because we are perfectionists, we added another element. The ability to manage the velocity of our platforms rules engine to be able to decide on received step data as quickly as possible, as close to real time as we could. Waiting for a large time frame before informing a user that a milestone has been reached was a loss of engagement we would not stand for.
Also, no Fitbits on ceiling fans because no one would do that right? We needed a way to detect fraud too.
And finally, the Earth kept on insisting on rotating — we had to cover that one too…
While the above sounds easy and obvious (ok maybe not), we knew it was not going to be an easy task to accomplish, but we were ambitious.
In the amazing world of IoT, you are left with having to trust providers to do the right thing — in the end IoT data is only as good as the providers and data sources you are working with.
Now, before diving into our journey, lets add another small fact …
- Date Processing is Hard
- Date Time Processing is Harder
- Date Time Processing including Time Zones — yes even harder
- Date Time Processing including Time Zones and accounting for Daylight Savings Offset — you guessed it, even harder and don’t give me the “Let’s just skip daylight saving for now”
- Extra Brownie Points for Date Time Processing including Time Zones and accounting for Daylight Savings Offset ON the day when Daylight saving change occurs (I can hear your screams across the internet if you had to deal with this issue before)
Also, there are only 23 hours, 56 minutes and 4 seconds in an actual day and don’t forget the occasional leap seconds being added— but I digress.
Since we have been step-syncing for a while, we learned valuable lessons from prior experiences. To capture the broadest range of use cases we came up with the following test scenarios.
Types of Activities
- Short Bursts, ie getting a glass of water, walking over to someone’s desk
- Shorter walks vs longer walks, going to lunch, romantic walk on the beach
- Run differences, full 5K, jogs, walks, intervals, sharing 5K times on Facebook
- Driving in car, biking, treadmill, gym workout, Instagram gym mirror selfies
Warning: Boring serious section …
Phone Location — Phone with device during activities vs Phone left out of reach of device during activities and synced after activities occurred.
Device Sync Delays — as we noticed changes based on where the phone was, we added another dimension to sync processing — tracking activity without syncing device data to phone for longer time frames (and yes we had additional differences)
- 1 day
- 1 week
- 1 month
Provider Request Differences — we took it further and wanted to get more clarity on what happens with data from providers based on when our platform made requests. We setup 4 fully standalone platform instances for each user account (ie we tracked each user with the same activity in four instances) using the following interval periods for data requests to providers.
- 2 min intervals
- 5 min intervals
- Hourly Intervals
- Daily Intervals
Back to Fun Mode …
Device Activity Manipulation
Obviously, we wanted to get an idea of what devices and providers did with data that did not truly represent steps of users — this was the most fun in our testing. Our rules were simple, no rooting of devices, no request intercepts or “hacking of the app” — simply being a clever user of either the device or app to try to trick it into capturing steps.
We started with the simple handshake approach but did not stop there — strapping a device to the wheels of the car was one of the tests as well — after having lost one on the highway we got a bit fancier on our mounting options but anything above 60mph had a short lifetime so we kind of gave up — a tad too costly.
Our winner though, with one specific app, while sitting in a chair and messing with the app settings we got from 2,500 actual steps to 91,510 app captured steps within an hour time frame — NICE, exactly what we were looking for.
Due to the overwhelming complexities of use cases and substantial variances we had observed and continued to observe, we created our own manual logging app to track what was done by each user and how data providers reacted — every morning we checked, double checked, reconciled, re-planned, drew conclusions, scratched our heads, changed code and added test cases and to ensure we kept our 100% code coverage.
On the topic of coverage, 100% coverage is a fallacy in itself — yes the code is covered but how do you know the implementation is correct — having 100% does not mean your stuff is going to work . We “simply wanted to sync steps correctly”. A typical software engineering problem — its not about knowing how to write code — its understanding complex problems, abstract thinking and correct choices to satisfy a requirement technically.
A little Rant on why this was so hard
Warning: Loads of technical jargon here … Recommended background music is JT/Cry me a River or someone playing the worlds smallest violin.
Some providers give you snapshots or aggregates, some give you raw activity, some even give you what appears at first to be raw activities, but are actually aggregates or estimates that can change after the fact.
Some post process their raw data first before making them available on the end points, others publish the raw data to endpoints and then apply post processing rules thus changing the data that may have already been requested from our platform previously. This specific issue was probably one of our biggest challenges across the provider ecosystem as we not only had to deal with data updates and removals, but also bugs on the providers based on various conditions. For some providers, we even had to go as far as requesting data from multiple end points to be able to reconcile steps correctly.
We had also noticed changes from the same providers over time, requiring us to implement a “Provider Sync Algorithm Detection” process that was smart enough to understand their current approach and triggers system notifications if we detect a possible change on how data is being provided.
Obviously, everything date/time related should only ever be stored in UTC in your systems, so navigating through the large variety of time zone related issues yet again came with many challenges.
Some providers give you Region/Area time zone (Olson/Eggert), some GMT offsets, some just UTC, some in current user time zone (and not accounting for time zone changes of the user during specific periods — yay us), and some can even give you variances of time zone flavors across different requests.
Another nuance was related to User Time Zone vs Data Time Zone differences when making requests. Something we called the “Data Request UTC Gap”. While some providers return activity data in UTC, the request to get the data is based on current user time zone (so imagine the time zone of the user changing on the same day). This resulted in UTC gaps under certain conditions thus requiring you to possibly query “2 User Time zoned based days” to get access to a “single UTC day” equivalent.
Not so easy after all …
Withings — Special Highlight
A super special thanks to Withings (now Nokia Health). I cannot praise this company enough.
We were extremely lucky that we had contacted Withings at the perfect time — we were looking to swap over to an OAuth2.0 connector and asked if they were any plans — they were working on it !!! — we got directly connected to the developer and had contact via Skype, Miami to Paris, working together for a couple of weeks. They were so incredible that they not only included our feedback for some changes, but also ended up creating a custom end point to satisfy our requirements — since this is the Internet I do not want to use names, so John Smith in Paris — you are amazing and your help and support will forever be remembered.
Based on what we had to work with, we feel we extracted maximum value — immutable steps by provider with time zone references and broken down to granular levels.
It is quite difficult to put a scale on the efforts required, but we can share one measure. Our final infrastructure to process steps for providers requires a total of almost 40 different tables — yes almost 40 tables to create 1 final table that contains steps, by user, in hourly time bucket, in UTC, with user time zone, and device … FYI its web-scale too don’t worry.
Obviously, we did a few other things too. Our time zone selection by user is based on weighted scores giving us multiple sources of time zone for a given user and date with different confidence scores (we capture up to 6 possible sources for time zone data per user and per instance of time).
The other of course is our configurable statistical engine to detect step fraud — This is also score based by analyzing patterns of activity, detecting deviations to prior user activity and further cross referencing them to human achievable models (Usain Bolt may not be a good fit for us though).
But Wait there is more !!!
At this point our user story “As a User I want to track my steps” appeared to have been accomplished.
But since we don’t scrum and rather use good architecture, honor and pride as a mean to accomplish Business Requirements, we went back to the drawing board to challenge ourselves on what was missing — we went over all our logs, data, notes and prior discussions and now included our last variable … The Rotation of the Earth.
If you are still with us and properly understand time zone processing — you will likely conclude one thing, traveling against time (into the future) is not an issue. But traveling WITH time (into the past) brings another set of problems, as you could find yourself taking steps in the same local time as you had taken steps before, in another time zones local time, or even UTC for that matter depending on how fast you traveled.
We now had to account for the MTTD — “Maximum Time Travel Delta”, meaning how fast can a human travel back in time based on our current available technologies (excluding space travel, sorry NASA employees on ISS, your “steps” don’t count). In the end, MTTD was a requirement to influence our “Step Rules Engine Velocity” — ie on detection of certain conditions in data, our velocity would slow down on step rules execution until “time has caught up”.
Battle hardened, confident, with 100% code coverage, we setup our action plan as we had no idea what each of the providers would do with this use case. We packed our bags, got all our devices with us, booked a flight from Miami to San Francisco, of course with isle seats so we can move in-flight to track steps, and went on our way. But we are going to leave this for another article — stay tuned.
TLDR: syncing steps is hard, mmmk