Everything about Firmware Over The Air Update
In biology, evolution is the change in the characteristics of a species over several generations and relies on the process of natural selection.
Well, you could say it is “The most powerful OTA in human history”. Using OTA, we update features of devices to improve its performance. The other way of saying this is “a device is getting evolved over the air”.
In 2018, Consumer Reports found that the braking distance on the Tesla Model 3 was worse than that of a Ford F-150, CEO Elon Musk took the criticism and came up with a solution. After a few days, Consumer Reports changed its review and give the Model 3 its recommendation.
What changed? Tesla pushed out an OTA update, one that the carmaker says tweaked the calibration of the vehicle’s anti-lock braking algorithm. That cut the vehicle’s 60 mph stopping distance an entire 19 feet, to 133, about average for a luxury compact sedan.
This car automatically downloaded some code and instantly made itself safer. Updating software of the embedded system like this is called Firmware Over The Air Update (FOTA/ OTA).
Improving your device performance without disturbing your customer or recalling your device is a pretty cool concept.😃 Just think about it, you could do a lot more things which were impossible without OTA. Without the power to repeatedly meet customer expectations, the IoT product will quickly become obsolete.
In today’s fast-growing world, it is really very important to keep your device up-to-date and be in competition with other manufacturers. Why would a customer buy your device over others if you are not providing anything better? Would you buy any smartphone which does not support WhatsApp?No right?
Do you remember the Nokia 3310 old model? It was launched in 2000. It was one of the most successful phones. But would you go for that now? No, because you want a phone which has an android/ IOS/ windows operating system which keeps on updating your phone and provides new features.
The same goes for your device and without the OTA feature, you may feel handicapped. Sometimes you know the bug and you won’t be able to fix it unless you have access to the device. By that time, you may have lost your reputation, money, and time.
Let’s consider the case of updating car software.
In a traditional way, either customer has to pay a visit to a service center or someone from the manufacturer company has to pay a visit to the customer’s place. After that, we need to open the car, laptop, etc and create a mess just to update the software in the car.😵
But if it has a FOTA feature, then software update is just a click away. Customers and manufacturers both can sit, relax, and do other tasks. No need to open the car and not at all messy as it was before.😁
Benefits of FOTA in manufacturer’s perspective:
There’s a quote by Mark Twain: Continuous improvement is better than delayed perfection.
- Manufacturer gains the ability to remain compliant with evolving industry standards. This expands product lifetime.
- It helps to reduce warranty/recall costs by reducing/eliminating service center visits or help desk calls for vehicles or IoT devices. It reduces costs, and complexity by executing new updates quickly.
- With the ability to resolve issues remotely, techie doesn’t have to waste valuable time traveling on-site to fix bugs or other problems.
- It offers the advantage of speed by shortening the length of time it takes to upgrade installations as you can upgrade multiple devices at the same time.
At Carnot, we have developed firmware for various IoT devices. We built an OTA Mechanism for a number of micro-controllers (ARM cortex M0 to M4).
Consequently, we thought it would be useful to share some best practices for supporting OTA updates and share some of the awesome things we have built at Carnot around the OTA stack, using which we have pushed around 1 million successful updates so far!
Well, the first question would come to your mind: how and from where do you start building the OTA mechanism?
Blocks of FOTA Mechanism :
A successful OTA update requires complex coordination between IoT hardware, device firmware, network connectivity, and an IoT device cloud.
Bootloader: A Bootloader is the most important block in the FOTA system. It resides in a separate section of memory other than actual device firmware as updating the firmware of the device must not modify the Bootloader code. If Bootloader goes down, everything goes down. You won’t be able to do anything. Hence it should be safe in a different section of memory.
Generally, the microcontroller manufacturer provides Bootloader. You can directly use that or build it on your own.
You can consider the following points :
- OTA flow
- Bootloader development cost and time
- Memory issue
- Peripheral access issue
If you feel developing Bootloader is better than understanding the manufacturer’s Bootloader then you can develop on your own. You can develop a single-stage or multi-stage Bootloader according to memory and peripheral access.
Trigger: You should decide how and when your system should go into boot mode. Your Bootloader always resides in internal memory, but your system somehow should go into that part. So that it can proceed with FOTA. For example, in your phone or a laptop you get a notification that a software update has been downloaded and when do you want to schedule the update. When you click on restart now, it starts updating your software. That is the trigger I am talking about.
There are a lot of options such as :
- Button press
- Software jump
- Condition check on reset
You can choose whatever is convenient to you or in case you are using an already developed Bootloader then you can check what that Bootloader checks to go into boot mode and you can develop the application layer accordingly.
Network: You can use LTE / WiFi / other protocols to get the latest firmware update.
Server: You need to have back-end so that you can send firmware updates to your device.
Storage: When you send new firmware to the device it should be stored somewhere. We send firmware in chunks and updating firmware along with downloading firmware is very dangerous and not recommended.
Other than that, if downloading stops in between then it can resume from where it paused, it there is a separate storage space for the OTA.
Few microcontroller manufacturers provide Bootloaders with the dual bank feature. In which internal memory is divided into two. Either of those parts is used to flash the code if it is successfully updated, then Bootloader runs the application from that location or reallocates the firmware.
Hardware: You can list out your hardware requirements by considering the above points. For example, if you are choosing WiFi for downloading Firmware but does your current system have that feature? If not then you need to add that peripheral and develop the required code.
These are basic building blocks of the FOTA Mechanism. But there are other points you should consider if you are going to implement the FOTA mechanism in real-time applications.
CRC / Checksum: A cyclic redundancy check (CRC) is an error-detecting code commonly utilized in digital networks and storage devices to detect accidental changes to data. There are chances while downloading our firmware file may get corrupted. So we need to make sure that correct firmware is downloaded and flashed.
So that system won’t get bricked. CRC check would help you to that. You can use your own algorithm to calculate CRC for your system. There is a number of algorithms you can find on the internet. In the adjacent image, you can see one of the CRC calculation codes.
Security: You can use an encrypted layer on Bootloader as well as on binary files for security, Some use key management and key rolling in FOTA systems.
FMEA (Failure Mode and Effect Analysis): You need to think about cases where your system may fail and handle those in your firmware.
Challenges you may face during FOTA development:
- You can’t fix the issue using over the air update if the device is not connected to the internet due to software fault. We won’t be able to do any other progress until this is fixed.
- Bricking a fleet of devices with a bad firmware update is much more possible, and a drastically worse situation to find yourself in.
- Most important is to test the firmware in all possible situations locally before doing global OTA. So, the above two-issue would be less likely to happen.
- Keeping track and making sure that firmware rollout goes smoothly can be a major challenge.
- We have to make sure that, size of the latest firmware should be enough to be fit in storage, or in case of dual bank flash memory, the size of firmware should not be more than half of the total internal memory of the device.
FOTA at Carnot
We have implemented different FOTA mechanisms which are as follows :
- Target Controller downloads and the host controller updates the target controller’s firmware
The latest firmware is downloaded in packets using GSM in the target microcontroller and stored in an external flash. If there is an interrupt in the download process due to network issues then download process resumes from the last downloaded packet. It doesn’t start from the initial position. After the download is complete it notifies the Host controller about download completion. The host controller puts the target controller in boot mode. The host controller reads the latest firmware from an external flash and sends it to the target Controller’s Bootloader and Bootloader to update the firmware.
2. Target Controller downloads and updates its own firmware
The latest firmware is downloaded in packets using GSM in the target microcontroller and stored in an external flash. If there is an interrupt in the download process due to network issues then download process resumes from the last downloaded packet. It doesn’t start from the initial position. After the download is complete it jumps to the Bootloader. The bootloader reads the latest firmware from external flash and updates the firmware.
3. Host Controller downloads and updates target controller’s firmware
The latest firmware is downloaded in packets using GSM in the host microcontroller and stored in an external flash. If there is an interrupt in the download process due to network issues then download process resumes from the last downloaded packet. It doesn’t start from the initial position. After the download is complete it puts the target controller in boot mode Host Controller sends firmware to Bootloader. But the Bootloader writes new firmware in another section of memory and if the application is valid only then it updates the target Controller’s firmware.
4. Phone downloads and sends latest firmware updates to Host Controller
We update our system’s firmware using the phone via Bluetooth. In this case, the application in the phone downloads the firmware and sends an update request to our device. Whenever the device gets a request for new firmware, it goes into file download mode. It downloads firmware in chunks and stores it in an SD card. When it is finished downloading the whole firmware, it validates the firmware and if it is valid only then it jumps to the Bootloader. The bootloader reads binary from the SD card, flashes it, and jumps to the new firmware. If the firmware is not valid then it deletes the whole file and starts downloading from start.
In Bootloader, it checks for new firmware, if it is available & valid then it updates the firmware. Otherwise, it runs the old firmware.
You can choose any of these FOTA mechanisms according to your need and hardware availability.
Other than this, we keep version control for firmware updates. So that we should know which devices have the latest firmware and it helps us to know if an OTA update is successfully completed or not. We keep track of rollout using the OTA stack. Each device sends its firmware version to the server.
If a new firmware image is a buggy then neither the user nor the manufacturer could easily rollback or overwrite the bad image. The OTA mechanism must be implemented in a fail-safe manner. Automatic recovery from corrupted or interrupted updates is a must.
We have two fail-safe options, in case the device fails due to a software bug and lose communication with the server:
- Backup FOTA
- Factory Reset
Backup FOTA: We keep a copy of the latest firmware in the system, in case there is an issue in the system. It goes back to the last updated condition. Just like in a game, if you die it restarts from the last checkpoint.
Factory Reset: This option is used when a problem isn’t solved by Backup FOTA. This does factory reset just the way we do in our mobile device, it cleans everything in the system and goes back to the initial version where you just bought the device. Your system would be the same as new.
One of the most important parts of any project is Testing. We can catch product defects. We can also find out what our product can endure.
We have tested our OTA system using the following cases :
- Loop FOTA test:
In this test, we keep a device standalone and send firmware updates in a loop. On every update, the system downloads, flashes and jumps to the new firmware. Once it is in the application, it again gets a Firmware update request and the cycle continues. We monitor that device for ~300 FOTA, to check whether it is successful on every attempt.
- System reset:
In this test, we forcefully reset the system number of times during various processes of FOTA to check whether it is still able to complete FOTA.
- Invalid file test:
Forcefully we send some parameters wrong to check our error handling working.
In Carnot’s upcoming OTA stack, we are working on large-scale robust architecture for OTA using BLE sensor mesh. We believe when combined with the unique edge connectivity protocols for OTA transfers, this architecture can be scalable for real-world IoT deployments in remote areas. Another exciting development being done on this front is automating the OTA pipeline for our firmware teams, through some of our data science initiatives and using machine learning-based anomaly detection through the data monitoring platform. This should make our OTA tech a very powerful option for IoT stack using data protocols, deployment automation, and continuous corrections through early detection.