Microservices vs Monolith: examples and reasoning of our journey

Marco Rosello
Boozt Tech
Published in
10 min readFeb 28, 2024

This blog post is based on opinion battle-style talks Edvardas Kazlauskas, Director of Engineering, and Marco Rosello, Web Development Director, gave together during developer community meetups in Malmo (Sweden) and Vilnius (Lithuania).

There are many discussions on which architecture model to choose and which is better for which case. It is a question with no right or wrong answer. Each of us has been working at Boozt for more than seven years and while the company grew, we tried various approaches to fit our needs best. We have both successful and less successful experiences. We’re sharing our practical examples and the reasons for one decision or another.

Marco:

Let’s look at how we operate. We have 40 tailor-made internal systems orchestrated by 190 developers at Boozt Platform in 5 different offices. A simplified representation of our system layout would be — a web shop, warehouse, finance, brand portal, and two little microservices.

We can see the webshop has 2.1 million lines of code and the stock service has 15,000. So, it might be fair to consider one microservice and the other one — a monolith, even though it’s not a single one for the whole company. So first of all, let’s focus on WebShop and Parcel API, and later on Stock Service.

WebShop is a big monolith and there are benefits to that. This monolith contains modules like account, checkout, search, etc. Different teams own these modules, sharing the same codebase and database for all developers. We also use GitLab codeowners, which is quite useful to ensure team owners review all changes while not blocking other teams. So, working within a monolith is simple — we have one database, no network and one CI/CD pipeline.

The drawbacks. One single system has scalability limitations. When you have multiple, you can scale more. Also, as it is big, many people are working on it. We have 40 developers with frontend and backend working on the same code base, CI pipeline and database. We also have a continuous deployment philosophy, we have few commits in one deployment and very small commits, therefore we can deploy 30–40 daily changes. That means we must wait for the whole deployment pipeline (10min) and the canary deployment (12 min). This slows down the overall delivery and becomes a bottleneck.

We have shared ownership, yet we still try to have everything into modules. However, there are common areas and we have chapters that aim to cover that. So all frontend or backend developers meet, ~20 per chapter, and try to discuss solutions to common problems and overall architecture. It’s not easy to have open discussion and agreements with ~20 people. And as we have one database, it takes a lot of work to have ownership. You can own certain tables but who owns the CPU of the database? Who is responsible when it goes down? Yeah, this is the reality of the monolith.

Let’s go into an example of a microservice — Parcel API. It was the first one we separated from the webshop. It tells what pickup locations are available for that address — PostNorth, Instabox or whatever provider we have. In 2017, it looked like a good idea. It was very small and we could extract it and separate it. But later, we noticed it was too small. When you don’t actively develop it and only add a new distributor every now and then, having such a small system requires too much maintenance. Also, it was mainly used by one service, so it wasn’t scaling independently — it was scaling in parallel with a monolith, so we also had to load test, upgrade it, maintain infra etc.

We decided to merge it back, and that’s a controversial thing — someone is taking a microservice and putting it back into the monolith. Why would you do that? Yet we did it, and the reason was that granularity needed to be more significant for us. We didn’t think all the maintenance was good for us — the team was unhappy about upgrading and it was always lagging behind. Nevertheless, we merged it back to the main monolith and we plan to extract a much larger service (checkout), which will have much more active development, and we’ll have three to five people focusing only on that one, with a separate database, CI, CD, etc. Yeah, but we also had some success stories with microservices in our platform.

Edvardas:

One of the microservices that is a success story is Stock Service API. Some prerequisites led to the successful creation of this microservice.

First of all, it had a very clear and defined function. The sole goal is to answer a question — how many items of a given product do we currently have in stock? This is an important question so that you won’t oversell or undersell. Overselling would lead to a bad customer experience, and underselling would hurt the business.

Way back, before having this service, the responsibility of keeping stock numbers has been shared across multiple systems. It was challenging to tell how many items we currently have in stock because multiple systems would get different information from different sources and maintaining the consistency was painful. If one wanted to be absolutely consistent, one would have to physically go to the warehouse, pick a bin, and calculate how many items are stored there. Of course, I’m exaggerating, we didn’t do that — I hope so 😀 As you understand, keeping consistency across multiple sources of truth is difficult and you cannot call the source of truth if you have multiple of those, right? So how did it go?

We set out to create an actual source of truth — a service that we could trust to keep and modify the numbers. We no longer have different numbers for our stock and it’s a huge achievement for our business — underselling or underselling is a very rare occasion, if ever.

Some systems still have stock numbers stored locally, but in that case, it’s not considered a trusted source and is only used in emergencies, f.e. when there is an issue with the Stock Service and it becomes unavailable. Services that do reporting can also store aggregated information since they don’t have a critical requirement for the information to be real-time.

It wasn’t only a win for our business. We felt the positive consequences of defined ownership and higher autonomy. This was very important, especially the increased autonomy because the technology stack could be tailored to specific requirements for the service rather than continuing with the standard stack we had back then in the rest of the platform. The team decided to go with Elixir functional programming language, which is built on top of Erlang — a choice well suited for highly available, fault-tolerant, and performant systems.

The resulting design and service matched the elasticity and scalability requirements. Soon after the service was released into production, we managed to “accidentally” test those requirements.

It was accidental in a way that we didn’t know that the test was supposed to happen. Suddenly we noticed a multiple-fold increase in the traffic and slightly increased latency. There were no campaigns around that time, so it was difficult to explain the increase. The service handled it gracefully, scaled up, and remained rock solid. Such an increase wasn’t expected to happen so we started digging around into the root cause. It appeared that the webshop had a deployment that “accidentally” turned off the cache of the Stock Service responses. Hence, displaying any product on our site resulted in a separate call to the service. We quickly resolved the issue, but seeing how well the service scaled up and performed was reassuring. Similar accidents might not end up well in a monolithic architecture as scaling up the monolith is usually a slow process.

Oh, and did I mention the performance part? It’s rock solid and blazingly fast, meaning sub-millisecond response times minus network latency. It is important to note that talking about performance does not include the network part — later it will be clear why.

Now, let’s go to the challenges we had to solve.

First of all, we introduced a new programming language. Being a functional language, it differs from what our developers were used to. It was a risky decision because we didn’t have enough experience within our platform to operate on this new technology stack. When introducing a new language and technology, it is important to focus on education and acquiring enough of the buy-in. Without that, individuals who introduced it will be stuck and eventually, if they decide to move on, your organization will inherit a black box that needs to be refactored to a better-recognized stack. On the contrary, if you manage to get your colleagues excited and familiar with the tool, you’ll make things easier for yourself in the long run.

Network issues are a massive headache once you embark on the microservice journey. Whenever you’re designing microservice and not thinking about the possible network issues — you’re doing something wrong. We experienced multiple issues: split brain, where instances couldn’t talk to each other and they didn’t know who the leader was and whom we should trust; we had lost packets, we had some random latency increases, we had lost connectivity, and so on… Extra resilience measures should be built into the system to counter-support potential network issues. These should include rate-limiting, load-balancing, exponential backoff, and, of course, accounting for service downtimes — it’s not possible to design a service that would have 100% uptime and is always available. That was a humbling learning experience and we learned these lessons as we went.

We went with K8s to orchestrate the service. That allowed us to scale horizontally quite well, but since it was one of the first services on this new K8s infrastructure, we had a bunch of added issues to curb it. That was a risky decision and cost a lot of effort, sweat, and tears but paid off in the long run.

Lastly, the biggest drawback of such architecture is that it introduces single points of failure, and Stock Service unfortunately became one. If the Stock Service is unavailable, the rest of the systems can halt if there haven’t been any fallbacks added. The way we designed a fallback compromises consistency, but this is an acceptable price to pay to be still able to sell items. The failover was built for the webshop, where it keeps “cached” numbers in a less accurate state of stock. This is one of the previously mentioned systems where we still store stock numbers locally and this is intentional and used only in emergencies.

Of course, such a decision has been taken with business considerations. In your case, a compromise like this might not be possible. Then the only way out is to increase the availability of the service.

You can read more about it in our tech blog — the whole journey, the reasons behind stock service, language choices, etc.

The conclusions.

Edvardas:

First of all, assess the risks and benefits. There is no silver bullet architecture. Microservices mean a more significant upfront investment. For smaller organizations and startups, where all the systems can fit into the heads of 2–3 teams, it might be too big of an investment. So, it’s better to start with a modularised monolith. However, if your organization starts growing exponentially, the critical moment to onboard microservices can be easily missed. Later it will probably be too expensive to split into microservices — energy will be exhausted on splitting code instead of working on the features. In a very late state, it might be financially unviable to even start splitting the monolith.

Secondly — scalability and performance are not the same. You will achieve better scalability with microservices because well-designed microservices can scale independently both horizontally and vertically. However, it would be naive to expect better performance since you’re introducing network and additional latency. Microservice architecture can help tremendously when the monolithic system starts hitting a financial or traffic-serving scale bottleneck, however, that would be done at the cost of performance.

Thirdly — the hybrid approach is completely fine. If your organization has monoliths and microservices, it’s ok and you don’t have to switch to either side. Even the “solar model” is fine like we have — a couple of bigger monoliths and supporting microservices around them. Find whatever works for your organization. Remember that microservices are most helpful in solving organizational issues like organizing your teams to make them more independent, and technological benefits come second.

Marco:

The ownership is very important regardless of whether you have a monolith or microservice architecture. Ownership is essential because when it’s everyone’s responsibility — probably no one will do it. When your team gets to a certain size 30 or 40, this is a question for you to answer. Doing microservices might help having clear interfaces.

If the teams work independently, they will not talk to the other teams so much — they will do their own code, regardless of whether you have a monolith or not. So it’s better to separate everything as soon as possible with modules first and then microservices, especially if your teams are not colocated. Even in the same location but in different teams, it is already challenging, so chances are there will not be good enough communication amongst teams.

About the authors:

Edvardas Kazlauskas, Director of Engineering at Boozt

Marco Rosello, Web Development Director at Boozt

Boozt: Website, Career page, LinkedIn

--

--