To Monolith or to Microservice

Rich Marshall
Wealth Wizards Engineering
7 min readFeb 29, 2020
Team interactions and microservice architectures.

TL,DR;

Start with a monolith. Learn to recognise the patterns that indicate when you grow (reduce?) that into Microservices. Depending on your circumstances, that might still be straight away. It might also be never.

I was recently contacted by someone who’d been reading my my post about why we use the Istio Service Mesh in our Kubernetes platform. They’d made a really pertinent observation: Adding the complexity of Istio to the already complex Kubernetes feels like I really high cost for making it easier to run and co-exist versions of Microservices (co-existence being our use case for Istio). Why not just build a monolith instead and avoid all the complexity?, he asked. “That’s a really good question” I thought and one I have attempted to answer in the past but I think I should have another go.

At the same time, at Wealth Wizards, we’ve been adopting the principles taught in the brilliant Team Topologies book by Matthew Skelton and Manuel Pais. I’ve been promising to write something about our experiences to date with what actions we’ve taken from the learning. As I sat down to write the first, it stuck me that these are really the same post. The underlying subject is exactly the same, one helps deal with the consequences of the other.

Always start with the WHY.

Why are we using Microservices? Why didn’t that solve our problem?

@Wealth Wizards we’re building a SaaS platform which will allow us to provide regulated, automated financial advice to everyone, regardless of their circumstances. Financial wellness — which has a huge impact on mental health, should be available to everyone, from those with debt worries all the way through to those without a care in the world. Right now in this country it is typically the preserve of the wealthy — those who can afford to spend hundreds or thousands of pounds to know where they should best invest their money or how to use their pensions. There’s a lot to financial advice, many factors to take into account and literally millions of possible outcomes for given areas of advice — e.g mortgage, investment or retirement.

When we began our journey we had a handful of developers and we started out working on automating “at retirement” advice — what’s recognised as one of the most complicated and risk-laden areas. We started with a monolith. We had one problem to solve and two people working on solving it. With a monolith it’s (relatively) easy to know where the code is, how functions interact with each other and how and where to deploy the code.

With two developers it’s easy to co-ordinate work that’s taking place on that code base, what changes are occurring, why, and the expected outcome of those changes. There’s still a lot for two developers to consider — retirement advice is notoriously difficult to understand and get right. This is exactly why we started there — to prove that automating complex advice is possible.

As the product grew, along with ambitions and funding (or customers — depending on your business model) we needed to hire more developers “build faster!”. This meant more people having to co-ordinate around a code base. At four devs, we managed. Then it became necessary to start working on a new area of advice — a new product for us. We hired more people and made what turns out to be a good decision: we forked our code base. One team on Pensions, one on Retirement. Why was this good? Forking is often considered a Bad Thing(tm).

By forking the code, we were able to reuse some of the first team’s learning but we were also able to give the new team independence from the existing product. We’d unwittingly reduced the cognitive load of each team — they each had one problem to worry about instead of two and there were fewer people to co-ordinate when working on each code base.

We continued in this mode for a while — building two separate products on two code bases. They functioned in a similar way — they were essentially twins but they slowly grew apart. Over time, our release frequency slowed. The products grew, along with the capabilities but each change became more complicated and with greater risk.

We had a funding round and wanted to grow more. We hired more devs, we decided to move to JS (node and react, full stack), microservices and kubernetes. It was the latest thing and microservices solve slow releases, right?

We started to decompose our monoliths. New functions were built as microservices and the teams started releasing more often. Kubernetes took care of the traditional deployment and scaling problems like load balancing, service discovery, etc. this carried on for a year or more. New product lines were started — investment advice, DB (“final salary” pensions) Transfer advice . Each time, a new, independent team.

Midway thought last year we ground to a halt. Releases were taking weeks, sometimes months, to appear and contained 10–15 microservices at a time. The platform we’d built had little ability to hand over from one type of advice to another — something an adviser would do seamlessly. I’m good at pointing out our worst faults but it truth, it wasn’t all bad. Some teams were still able to release frequently, but they were generally not working on “difficult advice” problems.

What had gone wrong? The truth was, it was hard to know. Conways law was itching at the back of my mind — “organisations design systems which mirror their own communication structure” and I had this memory that the real value of microservices lay in being able to model different functions in your business or process as independent functions (and ultimately teams owning those functions), not just in building small code bases which are “easy to reason with”. I also had a fear that we were no longer building a coherent platform as we liked to describe it, but different products using common technology.

At the same time as this, some of our teams started to recognise and complain about the number of things they were having to work on and importantly, to think about — we started to understand the term “cognitive load”. When you’re moving from one problem to another you often have to stop a moment and reset your brain, “load-up” the new problem and start again. This context switching is a real performance killer — anyone that’s worked closely with Operating Systems will know that CPU switches can be a performance killer — it’s a valuable performance metric for system administrators to follow, especially in older single-core CPUs. The teams started to recognise the affect that working on such a complicated set of problems was having on their cognitive load and that ultimately it was damaging their ability to code, refactor and solve problems quickly and efficiently.

I started to look back at some sales slide decks(!) and diagrams I’d been building of our systems to demonstrate to 3rd parties how we mapped our ‘Fin’ to our ‘Tech’. Slowly ideas started to form — could we map our micro services directly over these advice areas? We were close in many places but to date we’d taken an engineers approach to breaking down the system and the truth was, we weren’t domain experts.

A few of us had been aware of Matthew Skelton from talks he’d run in the past and so I happen to follow him on twitter. We’d been reading books like Designing Delivery and Accelerate among others and getting great ideas about what seemed right but didn’t really know how to put it into practice. By some chance of fate it turned out that Matthew and Manual Pais had clearly also been thinking in depth about these problems (not specifically for our benefit, unfortunately) and were about to launch a book helping solve these problems.

We had to wait a few weeks — their timing was good, but not impeccable — so started to scan what they had available online. It made sense immediately — perhaps because our heads were already in the right context. When the book arrived I devoured the audio book. I quickly evangelised the book to the teams and encouraged them to get copies and learn.

Now we could form a model on how to solve not only our immediate cognitive load problems but also our potential scaling problems in future. The methods described in the book have been a huge help — recognising the different team patterns — enablement, complicated sub-system, platform services, value streams. We quickly identified the problems we were seeing and the most appropriate solutions to help us reduce cognitive load and deliver faster.

So far so good but we’re nowhere near done. Having these new patterns we’ve been able to take a step back from our platform and look at the functions and the microservices with a mind to how we _want_ them to work. What structure will give us not only the most cohesive yet decoupled platform alongside the highest performing empowered teams.

We’re really still at the beginning of our journey but Team Topologies and a Microservices approach — with the knowledge of _where_ we need to build boundaries, has given us the tools we were looking for and have helped us to build a plan and have confidence that we know where we’re going and how to get there.

I know there’s more to come in this thread. The whole process has stated me thinking about how we build truly empowered and engaged teams. This is no small challenge and has lead me down the path of “Teal Organisations” — as proposed by Frederic Laloux. Definitely a topic for another post, in the future, when I know where we’re going. In the mean time. I’ll stop here but feel free to comment or ask questions if this sound familiar to you.

--

--