Embedded Software and the pursuit of perfection
“Perfection is the enemy of good”
“It’ll be OK” / “It’s good enough”
“It’ll never happen in the field”
Every hear any of these phrases? I’m sure you have. Every hear of them in reference to software? Perhaps. But if there were talking about embedded or systems software, whoever said them was gravely wrong.
Why? There are lots of reasons that embedded software has to be (near) perfect, but there’s some simple and obvious ones. This is the software that underlies everything; it’s the bottom layer in devices; all other software — middleware logic, databases, webservers, user interfaces, it all, eventually relies upon the correct operation of the bottom layer of software.
And in the top layer of software — and with all due respect to those who put extraordinary effort to getting them right — it sometimes doesn’t matter if pixels are out in the user interface, or it’s the wrong color — that doesn’t particularly affect the operation of the software underneath.
Testing is Boring
There’s a dirty secret with software. A lot of it is not well tested. Parts of it are done in a rush without consideration for the rest of the system. Testing is cursory, or done without full understanding of the system. Developers find testing boring, or they’re afraid of what they might find. QA engineers(those whose job it is to test), sometimes operate without complete specifications or deep understanding that the developer had when they made it (a source of eternal frustration for QA engineers).
And so it goes. Even in the systems/embedded world, this remains true. But we have to be much more careful when making this software. Any slip becomes apparent quickly — for reasons I outlined earlier. And so there’s a real effort to make things very very good — we check for errors everywhere, even things that will “never” happen.
There are a lot of things you can do to improve code up front. These tools are often used, but sadly not widely so. Many developers have never heard of them, or don’t bother to run them. Let’s look at a few (this list is far from exhaustive):
This is a source code analyzer. It can spot logic, style and other problems often missed by the compiler. Sometimes very subtle things that are not obvious by looking, such as file handle leaks, or mismatches with free/delete.
Like some developers, I turned up the warnings in GCC as much as possible (-Wall -Wextra -Werror), and fix everything. Some of them are tedious, but given how loose the C and C++ languages can be, this can pay off in the long run. Especially sign mismatches.
This is a run-time memory checker. It’s especially useful for spotting uninitialized memory, memory corruption and memory leaks. It does make the code run slower when run under it, but these kinds of issues can be extraordinarily difficult to spot manually.
These memory problems can be insidious. They can cause random crashes, or just crashes far away from where the problem was created. Uninitialized memory can cause randomness where none was intended, and memory leaks are bad in long-running programs. So the value of tools like this should be obvious.
That will never happen
In one instance, it became clear to me that some legacy software was somewhat unstable and would lock up, and now more so having been triggered by recent improvements in software it interacted with. It was clear to me, on top of various customer reports, that this was going to be a real customer issue.
The full saga is long and unpleasant, but there was a lot of objection to even fixing this, and disbelief that this would even happen. Of course, on one of my site visits, that’s immediately what did happen. In the end, I did fix it — and nearly 30 other similar issues — unfortunately, it was too late — the customer and contractor involved were so annoyed that it was largely a political disaster.
Fix your embedded software; take the time. It doesn’t have to be perfect — but it better be damn close.