Iterating Your Internet of Things With Over The Air Updates (OTA)

Devon Bleibtrey
Auklet IoT
Published in
14 min readJul 24, 2018

The age of IoT is upon us. The last decade has seen the advent of the most complex consumer-facing IoT application out there, your vehicle. We’ve gone from a completely disconnected form of transportation with four wheels and a steering column to an array of sensors and modules that communicate over an ever-growing suite of networks ranging from industry-centric busses like automotive CAN and LIN to more commonly heard of networks like 4G, WiFi, and Low Energy Bluetooth. Many vehicles today have over a hundred highly customized computers that can funnel data through a gateway and up to the cloud. As with other industries, this data isn’t sent in one unified way. There are a plethora of methods to serialize, compress, and transmit the data to reduce latency, minimize data usage, and increase security. Then once you have the data, you have to figure out what to do with it. Are there patterns that can help you make your product better? How about new revenue generation opportunities? Each question requiring a different visualization technique or machine learning algorithm to solve.

As the IoT market explodes and the data points multiply, we have the opportunity to create an entirely different world from the one we live in today. Unfortunately, diving into IoT can still be scary. There’s an endless amount of communication protocols, unlimited hardware options, different problem sets than those we’ve tackled in mobile and web development, and there’s still, as Jeff Bezos likes to call them, “divinely discontent” customers you are striving to satisfy. Our goal at Auklet is to help entrepreneurs, as well as established companies, accelerate their IoT aspirations, replacing fear with excitement and making it easier than ever to make an awesome IoT product!

Over the Air Updates (OTA)

Most of us take it for granted but updating a web application or mobile app is a well-defined process, and even the smallest company can quickly create a deployment pipeline that utilizes continuous deployment methodologies to roll out an update with every approved commit. To our customers, this happens without their knowledge or at the simple tap of a button on their smartphone. Almost never does a mobile app update result in a bricked phone or a website refresh with a white screen of death showing instead of your e-commerce selection. Seamless over the air updates was something I took for granted when I first started developing at a startup who was focused on digital out of home (DOOH) marketing and was again blown away to find out, was not standard practice at every major automotive manufacturer. Deploying updates to both of these edge devices wasn’t easy and for the most part, didn’t exist.

Early on at the startup, we had devices at multiple restaurants and bars, each of which of course had spotty WiFi and hard to reach TVs and projectors. We had a small embedded device that would plug into the backs of these displays and play through videos and images set by the store owner or third party advertisers. These environments made it extremely difficult to replicate specific issues in our local development environments where we had excellent WiFi, consistently 70-degree temperature, and limited humidity. So during our beta, we started finding a variety of errors. Memory leaks that only appeared after weeks of execution, hardware bricking because of the humidity and heat of the bar, and video jitters caused by poor performing functions and spotty network connectivity only to name a few. All this was happening while we needed to create new features requested by our beta customers.

As you can guess our devices didn’t have a marketplace or process for deploying updates, so anytime we wanted to do a release we had to go directly to the customer’s location, get up on ladders, (look like fools), and either swap out the hardware or flash the existing pieces. I’m all about doing things that don’t scale at a startup, but this was absolutely ridiculous. After immense frustration, I built out a web application that allowed us to distribute updates to our fleet. This solution worked relatively well, but took time and energy to maintain that would have been better spent on the actual DOOH application.

After leaving the startup, I went to work for one of the primary North-America based automotive companies. It didn’t take long for my mind to be blown by the fact that the frustration and problems I faced with a handful of devices at my previous job wasn’t being solved at this multi-billion dollar company. They had hundreds of test vehicles at different release levels, spread around the world. Every time a new software release came out, for a single one of the approximately hundred modules on the vehicle, an engineer would have to go to the car, flash the software over what’s called an OBDII port, and hope with every fiber of their being that something didn’t go wrong and brick the module or worse, the vehicle. This process was even more frustrating for infotainment and clusters (the things that do navigation and show you how fast you’re going), which had a lot of graphic files and could take up to an hour per vehicle.

Unfortunately, I had the perfect storm occur when I was flashing one of the clusters with a new release and ended up bricking it 45 minutes into the update, about an hour before one of the VPs was set to take the vehicle home for the weekend… This was not only embarrassing but expensive. Even for a billion dollar company a new module, whether for development or production isn’t cheap. Luckily we had a spare available and I was able to tear apart the dash, replace the module, and update it. Props to the VP overseeing the program at the time, as he was very gracious about the whole thing.

Most engineers in the IoT space can relate to stories like these because until recently, it had been difficult to push updates to edge devices that more than likely had limited network connectivity and very constrained resources. Fortunately data plans have become more cost-effective and hardware costs have continued to decline. As we move into the future, it is critical to plan to support OTA on any IoT application you develop. Not only to push feature updates and bug fixes, but also to ensure consistency with security patches. Each domain within IoT is becoming more and more competitive by the day, and being able to deploy your latest innovation is essential to making a fantastic product for your customers.

Picking an OTA Solution

There are a growing number of IoT frameworks that support OTA out of the box in some form or another such as AWS Greengrass or Resin.io. There are also others that fit more smoothly into projects that don’t want or cannot use one of these frameworks. Mender for example is an open source project that can either be used out of the box or serve as a good starting point for a custom implementation. In a never-ending expansion of options, there are a few things you’ll want to keep in mind as you select the right choice for your team.

Rolling Updates

  • Update Sections of Fleet
  • Validate Health of Updated Devices Before Proceeding
  • Schedule When Updates Occur
  • Manage Offline Devices Automatically
  • Status of Update Rollout

If you’ve done web development in the past five to ten years, you’ve probably started to take this fantastic feature for granted. Remember the days where you had to FTP into your Linux server and swap out the HTML, PHP, and JS files manually? Remember when you could SSH into a server and hack away live on production (of course no one has EVER done thaaaat…). With the proliferation of PaaSes, serverless architecture, and container solutions such as Docker, most of us now make a commit and simply watch as it’s shipped off to a farm of servers. This farm’s clusters automatically scale up while rolling out the update, verifies that everything seems to be healthy, and then determines whether or not to roll back the update, or transition the traffic coming in towards the new deployment. These advancements have resulted in one person shops being able to maintain almost 100% uptime with their web applications and scale to relatively large deployments without increasing their IT staff.

The same cannot be said for IoT devices. Up until the last few years, there really wasn’t any off the shelf solution to do over-the-air-updates, much less ones with the ability to orchestrate deployment to thousands of devices that may or may not be online at the time and might disconnect midway through downloading the update. The mass majority of IoT relied on pairing major software updates with hardware refreshes, did updates in store or at dealerships, or had homegrown solutions that enabled them to distribute some level of updates to their devices.

Now you may be thinking, why would it hurt to do an update blast to all the devices at once? The same concepts of uptime on web applications don’t apply to IoT apps, right? You’re correct that unlike web services, rolling out updates over portions of your user base won’t protect other users from downtime with their application. What it does protect you from is if your update suddenly starts bricking devices or causing unexpected behavior on specific revisions of hardware. Rolling updates should enable you to monitor the health of devices that have been updated, and set criteria before continuing a deployment.

Version Management

  • Define Version Currently Deployed
  • Revert To a Previous Release
  • A/B Testing Capabilities

Another element of web development many of us have become accustomed to is the ability to manage which version of an application is active on our server clusters. Many frameworks support rudimentary version management such as selecting a release for your entire fleet or specifying a specific release for a given device. More advanced solutions provide you with the ability to dictate which sections of your fleet have a given release and easily enable A/B testing to occur.

I’ve only seen this available in OTA solutions specifically engineered for the automotive industry. Keep an eye out for it though as it will undoubtedly eventually make its way into many offerings limited only by hardware restrictions. This benefit of this feature is that it enables developers to pump sensor data into the two variants of their applications simultaneously and record how each performs in the real-world situations. Unlike previous solutions, this allows companies to push updates to a vehicle, validate the software, and then completely transition to it or continue using the existing solution, while they push out another update.

Environment Variable Management

  • Set Environment Variables Across All Units
  • Notify When Update Has Completed Across All Units

Environment variables are the proven way to store configuration settings and secret credentials that you don’t want exposed to nefarious people. Having the ability to set environment variables is a must for any good IoT framework and fundamental to enabling true OTA capabilities. Without the ability to configure environment variables across your entire fleet you can find yourself in some unfortunate circumstances. Basically you end up succumbing to one of the following:

  • Manually provisioning each device
  • Rolling your own solution you have to maintain
  • Hardcode the values int your codebase

Hardcoding the values always seems like the easy way out, but please keep in mind that doing so can have the following adverse effects. Anytime you would like to update even one of those variables you have to do a full release of your codebase again. This isn’t the worst thing in the world of web and mobile devices, but in IoT any update uses precious data and power. These updates usually place a burden on your customer and will more than likely cause some amount of downtime for them.

Data Management

  • Management of Network Connectivity Loss
  • Automatic Compression of Release
  • Differential Management

In automotive we’ve always had an eye on data usage. Whether it’s the car itself or an after-market add-on that needs to have a cellular plan. Customers do not want to pay for extra data plans + the less data you have to pay for = improved bottom lines. That’s why it’s so valuable to have an OTA solution that emphasizes the importance of reducing the overhead associated with pushing out an update. This can come in many forms, including:

  • Compressing your update automatically before shipping it out
  • Utilizing one or multiple network solutions to reduce the cellular data consumption
  • Determining the differences between your latest release and what’s already on a given device and only sending the pieces that have changed

Enterprise Solutions

Are you in an industry that has a myriad of compliance requirements? Automotive is notorious for this, and for a good reason. Companies are putting a two-ton metal machine in the hands of the anyone that wants one. Depending on your industry this may be important or completely irrelevant. If you haven’t already done so, do your research on the regulations mandated by your industry or internally by your own company. Based on this information, ensure that any OTA provider that you choose meets all the requirements or is capable of customizing their solution to meet your needs. If you’re in automotive, I’d recommend taking a look at Airbiquity’s solution :).

Customer Experience

  • How will the OTA update affect your customer?
  • Does the provider have a way for your customer to accept or skip the update?
  • Does the OTA solution require your device to reboot?
  • Does the solution interrupt a customer’s usage if you roll out an update?
  • Can you schedule update windows?
  • How does the solution recover from a failure?

Automotive has some extreme use cases that don’t apply to everyone, but there are two that I love discussing which relate to a customer’s experience.

You’re a new car company, and to stay ahead of the curve you’ve made your flagship vehicle capable of doing OTA. You do your due diligence. You have your customers set update windows that you can use to limit what time of day the car will be updated. You add an interface that stops updates from happening unless your customer first approves it. Knowing this foundation is in place you roll out your first update. The notification comes up on one of your users infotainment screen and they accept it. The car then waits until 2 AM rolls around when the update window was set and it starts to update, great! Unfortunately for you this customer has a child that has asthma and it just so happens that tonight she’s having an attack. Your customer straps their child into the vehicle at 2:05 AM as he’s rushing to the hospital. Only to find that the car is updating and won’t be available for another 10 minutes. You best believe you’ve lost that family as a customer forever. Luckily this situation applies to very few areas of IoT and in this particular case, OTA providers for automotive have painstakingly accounted for these exact types of use cases. But I’m sure your industry has similar edge cases that when looked at on mass happen more often than you might expect.

The more applicable situation is your customer is driving down the road and a security update needs to be pushed out ASAP. You can’t require all drivers to stop and pull over to the side of the road. Does the vehicle need to be stopped to receive the update? What happens if the update bricks one of the systems in the vehicle? These types of questions apply to drones, wearables, smart transportation such as bikes, and a host of other applications. Make sure you list out the situations your customers can find themselves in when an update rolls out, and ensure that you understand how the solution you choose will manage it on your behalf.

Restrictions

  • Does their OTA solution lock you into a platform?
  • Do they support your target hardware?
  • How has their history been for hardware / operating system version support?

Many IoT platforms are all-in-one solutions that are not very compatible with other products. This can be a double-edged sword in that frequently having a single platform allows you to get off the ground quickly, but using completely proprietary systems and services tend to lock you in. This will commonly inhibit you from being able to pivot as freely as you’d like or easily change providers if pricing becomes less competitive. Vendor lock-in can be especially hazardous for IoT developers due to the hardware needs of almost every application. There are so many variations to hardware and so many optimizations to be made that over the lifespan of a product companies will often make multiple hardware iterations. Many of which occur in the first 12 months as prototypes are refined, specific pieces of hardware are removed from the bill of materials, and applications are prototyped in a scripting language only then to be converted to another to save on overhead. For these reasons, it is often a better strategy to mix and match providers that focus in on a specific niche and give you exit strategies to either bring things in-house or move to another provider as you grow.

This isn’t to say there aren’t use cases where a full out of the box platform makes sense. Hologram is an excellent example where they provide a white glove solution to not only the hardware, but the data plan, and your cloud solution. These types of solutions are great if you don’t know much about hardware yet and are just trying to get your startup off the ground.

Onward & Upward

Once you’ve integrated a way to deploy updates to your application, then you can start tracking how each new release affects your existing hardware solutions as well as how customers engage with your product. Tracking errors that were missed during QA and unit test cycles becomes mandatory. Even automotive companies with million dollar hardware-in-the-loop (HiL) rigs that run tests for months and test fleets of hundreds of vehicles still miss bugs before deploying updates. That’s why at Auklet we’ve built a monitoring solution specifically for IoT that you can use to track each iteration of your app.

We’d love to hear about how your company did updates before OTA was available or how they’re doing them today! Hit us up on twitter or drop us a note at hello@auklet.io!

Additional Reading

As a side note I’d never advocate against still doing your due diligence during validation :). If you’re looking for a more cost effective solution than something like National Instruments or dSpace (excellent companies and solutions but often times too expensive for small and medium size businesses) I’d recommend checking out Tapster who is doing awesome things to make HiL testing more attainable.

--

--

Devon Bleibtrey
Auklet IoT

Connecting teams and their communities. Co-Founder @nextreleaseio Director @esgtechnology.