IoT in .NET update, Azure IoT Hub resiliency and retries

On May 29, 2018, I blogged about my IoT Home Security Project. Today I thought I’d offer a small update on its reliability after a years use.

I have to say I am very happy with it, barring some minor issues, partly with the Azure/AWS SDK’s fault, and partly my fault. I did update my original post with a “September 2018 Update”. It talks about a .NET Core issue that occasionally crashed the Raspberry Pi, some SDK issues (ie: not reusing HttpClient), and additional information on increasing the Raspberry Pi stability.

However one issue I have had, was that after a semi prolonged internet outage, Azure IoT Hub did not reconnect, and stayed disconnected until the app or Raspberry Pi was restarted. Another issue was that sometimes the CPU of the Raspberry Pi shot up to 100% usage, locking all resources, including the ability to stop and start the app and the Raspberry Pi in a timely manner. Both of these are now resolved.

I didn’t get a concrete explanation why the Raspberry Pi CPU used to shoot up to 100%, but it was due to various known bugs in the Azure and AWS SDK nuget packages. The version updates for the past few months have resolved the high CPU issue without any changes to my code. So that’s a win.

Regarding the Azure IoT Hub not reconnecting after a prolonged internet outage, I tried to fix this myself but somehow managed to mess it up even more, and that transpired to more transient connectivity issues.

After reading a GitHub issue on this (which is now merged to master and out on nuget — DeviceClient internal pipeline redesign), I decided to rewrite my retry logic, and it’s been flawless ever since (even before this fix got merged). I haven’t experienced a real internet outage since my fix, but disconnecting the internet manually to test this out, Azure IoT Hub reconnected successfully once the internet was back, no matter how long the internet outage was. (Testing this without my code, Azure IoT Hub would not reconnect after a prolonged outage)

Now I feel way more comfortable that my .NET Azure IoT Hub Security System running on a Raspberry Pi is very resilient, will run for weeks on end, won’t crash out, and will reconnect to the internet when there is an outage.

Most if not all example code on the net are not production ready examples, so my sample application on GitHub has been updated with this new resiliency code. I have also update all nuget packages with minor refactoring as needed. You can go here to have a look at the my sample repo.

Briefly, the way I got Azure IoT Hub to be more resilient is to set a connection status changed handler.

And then, with some information from this GitHub issue, I either let the Azure IoT Hub SDK automatically retry the connection, or I opt to doing the retry myself in cases where I know that the Azure IoT Hub SDK might stop retrying entirely.

If you are working on something similar and worried about resiliency, there you go! The Raspberry Pi has never once failed me, locked up or crashed in the year that it’s been in use. So together with these resiliency updates, .NET Core issue fixes, and Azure and AWS SDK fixes, I am more confident in my Azure IoT Hub Home Security System than ever.

Happy ioT’ing!