Code: The Legacy (Cert 18)
Post-Microscaling, I’m taking a little time out to get a legacy project back under control. I’m battling a product that’s 12 years old and horrifyingly arcane (Windows desktop software! Outlook add-ins!! C++!!! VB!!!!!!). What possessed me! No-one’s going to ask me to talk at conferences about that!
But are they right?
The Curse of Legacy Code! — Wait, is that a Microservice?
Qlockwork has been lurking in my basement since 2005; mysterious and ancient times before AWS, smartphones, MacBooks or the modern cloud. I intend to review the whole project and use it as a small-scale experiment in what modern techniques can do for legacy code — what’s sensible to do and what’s not.
As is often the case with legacy applications, Qlockwork is a nice product with a good concept. There are some fabulous ideas in there, not least our early, Microservices-lite architecture. Very tiny adrian cockcroft.
How is Microservices possible in a desktop application from 2005??!? Did we go back in time!? No! It’s how the product has basically supported itself for 12 years. This very effective architecture is not as new as all that and I’ll tell you all about it further down…
People literally all over the world buy QW & find it useful every day. I don’t want to screw that up just because it’s old-fashioned but I do want to make sure QW’s not missing out.
So, I’m going to outline how the product works and how it uses a Microservice-ish architecture. Then I’ll review the key technology advances of the past decade to see what I can steal.
Harnessing Evil for Your Own Usecase
In 2005, there were loads of evil folk spying on our activities (mostly marketeers). The data they grabbed was not just useful in aggregate, it was useful on an individual basis. Why should they get all the benefit? So, an old pal and I decided to turn the concept around. We decided to write a kind of self-spyware application that grabbed your own activity data and gave it back to you. We had a vague idea how people would use the data, but it turned out our ideas were mostly wrong (folk overwhelmingly use it for billing, we guessed time management). We also had a vision of doing some machine learning on the data but it was a bit early for that.
My co-founder is still one of the smartest folk in the industry & works on security for Google these days.
At the time, the largest software market was for Windows desktop applications (the word app had not yet been invented) so that’s what we aimed for.
We had absolutely no experience of building consumer applications. However, we had tonnes of experience building and field-supporting server products that were designed for resilience, scale and remote diagnosis. Surely we could do this?
We architected the product to meet our 9 drop-dead requirements
1 Zero ramp-up. You had to be able to turn it on and use it immediately without having to learn anything. (This was something we’d learned from Microsoft, make it dead easy and re-use familiar user interfaces).
2 It had to be trivial to install (1 click) without any technical knowledge.
3 Our users should not need an IT dept (we wanted to sell to one-man-bands too)
4 It had to track all activities, all the time, for every application, on and offline.
5 The data had to be presented back to users in a way they would instantly understand and be able to manipulate for themselves.
6 There had to be almost no support load, but it still had to be easy to support and debug remotely (the bug rate had to be low, but there are always bugs).
7 Users should be able to recover from almost any error with minimal or zero data loss, with minimal help. They had to be their own ops team, but need no ops training.
8 The data had to be secret (ideally never leave their org).
9 The data had to be robustly backed up without the user having to do anything.
Nowadays, I’d use a Cloud Native approach to achieve most of this — you might argue the cloud exists exactly to do this stuff — but in 2005 the cloud was still in its infancy (no AWS for another 3 years) and cloud servers didn’t manage themselves. Requirement 6 would have been hard to achieve and, given the sensitivity of the data, so would 8. So, we had to come up with something else.
Can we use Leverage?
The most difficult thing to achieve was secure & robust data backup without data going off-site. Handling local SQL DBs would be problematic without a customer IT team and we wanted to be able to sell to one-person, non-technical, companies. Then we had a brainwave. Microsoft Exchange will store data for you.
Back in the mid-90’s I worked on the Microsoft Exchange project in Seattle. One of its lesser-known features is that Exchange’s message store is a distributed NoSQL database that you can access through a set of well-defined client-side APIs (Microsoft have always been the king of APIs IMO). As long as you parcel up your data correctly you can store and retrieve it through a local Outlook client and Exchange will secure the data and back it up in the same way it secures and backs up your emails or meetings. That was requirements 8&9 met.
A decade on, I still like this approach. I think it’s a great idea to store all your highly sensitive data in one location (in this case, emails, meetings and activities). That way you only have to secure one place. It’s the same principle used to foil U-boat attacks on shipping in WW2 — the convoy. Put everything you want to protect together and then guard it. Don’t give yourself multiple places to look after and divide your forces and attention.
The other good thing about this approach was if you store your data this way, Exchange/Outlook will let you view and manipulate it using their standard UIs. We chose to store activities as meetings and display them through the calendar interface with additional .csv exports. Everyone already knew how to use the Outlook calendar and Excel so that was requirements 1&5 met.
The only remaining problem was that to use the interface to the Exchange data storage we had to be an Outlook add-in (there are other ways but they’re not ideal). There were 2 issues with this
- we needed to record activities all the time, even if Outlook wasn’t running (requirement 4)
- we were going to need to poll the OS a lot to monitor what was going on, and we wanted that to be incredibly lightweight so it didn’t slow up the machine. Add-ins are not that fast.
Our only option was a separate, decoupled, lightweight process that ran all the time, capturing activities and storing them locally in a file-based storage system according to a defined format. Our Outlook add-in would then read this data asynchronously, do some post-processing and store it in the Exchange server where the robustly saved data would become the master.
So the overall architecture became 2 small services comprising a stateless process + a DB:
- The Tracker Service: a C++ process for tracking activity plus a pseudo-database (structured flat files, append-only, no read from the tracker, read-only from the UI)
- The UI Service: a VB.Net add-in for Outlook plus a distributed NoSQL database (Exchange message store)
The 2 services were decoupled, they interacted via a defined, asynchronous interface (the structured flat files). Both services supported full error detection, conflict resolution and retry on all interfaces. In particular, each service would continue to run if the other one was unavailable. That was just business as usual.
Not only did this architecture achieve requirement 4 it helped massively with our support requirements
- Faster deploy: it allowed us to make UI changes without affecting the decoupled tracker process and vice versa, which reduced dev conflict.
- Diagnosability: by examining the stored flat files, we could retrospectively determine whether any field bug had occurred in the tracker service or the UI service. We could also manually replay any flat files into any UI Service instance for remote debugging of UI issues.
- Disaster recovery: we could manually replay the local flat files in the case of catastrophic data loss on the Exchange server (occasionally it happened!)
- Resilience: it allowed the system to keep basically working even if one of the 2 major components failed, allowing time for automated or manual restarts.
You may notice that the flat files, although transient, are a potential failure point if the user doesn’t have file backups on their machine and doesn’t run Outlook for a while. That has happened a small number of times. Interestingly though, we had more heartbreak from a customer who lost a month of data because they forgot to install the application on their new laptop.
Did we Over-Engineer our Desktop Application?
Absolutely. But not our decoupled, somewhat Microservices-esque architecture. That has been great and it has continued to evolve from field exposure (we didn’t get all the retrying right first time, loads of the failure cases never occurred to us initially).
It has been great for resilience and diagnosability (scale is less of an issue with desktop apps ;-) This architectural approach is good if you want a stable, supportable product. Note that we also have plenty of tracing and logging in both services all the time. That’s imperative if you want to retrospectively diagnose and fix field problems (you do). BTW I still dogfood QW all the time too — also very helpful.
Main takeaway: you’re never too small for a decoupled, interface-centric architecture. Qlockwork is just a tiny desktop application but we still have 2 fully-defined internal interfaces (one of which was deliberately inherited from Microsoft) and 2 fully decoupled services. This was not over-engineering — it helped our application survive 12 years in the field with all the patches, local issues, workarounds, surprises and bizarre feature requests that go with the field.
So what did we over-engineer? Features!!! We put loads more stuff in than actually gets used, which delayed our initial release and gave us more to test. We should have got the product out sooner and based most features on user requests.
So, now that I’ve reminded myself what I like about Qlockwork, in the next post I’ll look at progress over the past decade and what I should be leveraging.
Oh yeah, one more thing…
You’re Never too Legacy to Kick Ass
(BTW this kick-ass old code has supported me for a decade).
If you enjoyed this post please give it some hearts! Many thanks!