At Dwelo, we’ve written the code for our IoT gateways in Rust. It’s fast, it’s reliable, and it’s secure. But we didn’t start with Rust, and we didn’t write the code in a weekend. This series is the story of where we were a year ago, how we switched to Rust, and why you might want to think about doing the same. We will cover reasons to use Rust (or not!), threading, hardware communication, fearless concurrency, interacting with C libraries, writing an MQTT library from scratch, and the lessons we’ve learned the hard way.
A year ago, I joined an IoT startup with a mountain of legacy Python code running on a Raspberry Pi. Over the course of several years, this code had grown organically, as most codebases do. The original purpose of this code was to monitor an MQTT command channel, decode and execute commands on a Z-Wave network via a serial port, and report back success or failure via another command channel. In addition, the code was intended to report back information about the overall health of the system and the state of devices connected on the Z-Wave network. Somewhere along the way the software metastasized to control status LEDs and control a cellular modem over a separate serial port and communicate with the onboard release management software.
A senior software engineer on my hiring interviews desperately wanted to rewrite the whole thing, and I was leaving a C++ shop that was comfortably juddering along on the momentum of its prior successes. Spoilers: he wanted to switch the existing Python code over to Rust, and the prospect was so exciting I jumped ship from my old job and moved cities.
But before we get around to talking about why we picked Rust or why that’s exciting, why did we decide to rewrite at all? The conventional wisdom says never to rewrite software when you have something that already works, even partially. But if we followed that logic, we would still be running the Apollo Guidance Computer software on future space missions (hey the architecture is 1’s complement, but the last bug report was in 1969 so it must be solid.) Why rewrite anything, ever?
In short, it boils down to business needs.
- Architecturally, the existing architecture was incapable of scaling to other technologies or changing direction without massive amounts of effort. The business had just been forced to change to MQTT from another IaaS provider because of licensing cost concerns, and the cutover took nearly a year. With new devices released every year (e.g. BLE, Wifi, Z-Wave, Zigbee, arbitrary REST APIs) the business wants to be able to change IoT stacks quickly to adapt to new technology.
- There were technical debt items nobody understood or was prepared to resolve. (Did I mention that none of the original programmers were still around to fix bugs or answer questions?) Fixing obvious issues in one place often broke the program in completely unrelated parts of the code.
- The program had unit tests in places, but there were no coding standards — someone’s “very clever” generator expression state machine drove the serial framing protocol, but it took weeks to figure out why it was broken.
- There was dead code everywhere, but we couldn’t prove it was really dead code.
- Holy cow, the bugs. Did I mention the bugs?
- There were opportunities to replace the error-prone first-party Python Z-Wave handler code with a vendor-supplied reference implementation written in C. It would have been more effort to hack the existing Python implementation around the C library than to just rewrite the thing.
- We wanted to run more customers on cheaper hardware. Improving that ratio directly drives higher profit margins for the business.
So from those business needs we can start to pick apart some of the actual requirements in the chosen language for our particular project:
- It needs to talk C and run against C libraries.
- There are timing requirements (because of the serial communications).
- We need to be able to run it on a potato (because of cost).
- It has to be able to run both the serial communications and a bunch of command/telemetry at the same time, without bugs.
- String manipulation should be easy, because the commands and responses are all JSON.
- The software must work correctly and deterministically, even though we are not all genius programmers.
- It needs to be secure.
As it happened, Rust fit the bill for all of these needs.
I’m not going to be one of those evangelists that says everyone must use my new favorite language. There are a lot of great tools out there, and no programming language is perfect for everything. In fact, Rust’s sharp learning curve and proximity to bare metal make it a suboptimal tool for many projects.
Why you may not want to use Rust
- If you need fast prototyping and need to do webby user-facing stuff, there are a billion JS frameworks out there. Electron is actually pretty fast these days, if you’ve got enough memory.
- If you need fast prototyping and need to crunch numbers for science, you probably want Python. Stop throwing money at Matlab, and save your grant money for undergrads. Hurry up now, dark energy isn’t going to solve itself.
- If you are trying to write threadable backend software on commodity hardware, consider Go or maybe Kotlin if you need the JVM library infrastructure.
- Sometimes a shell script is just fine!
However, if you are a human writing code alongside other humans and feel you need to write new code in C/C++ (speed, resource constraints, timing constraints, portability), you should strongly consider writing it in Rust.
In the next chapters, I’m going to go over some of the reasons we didn’t go with C++, and why Rust works for us.