How to move from a monolith to an event-based system

At the core of any modern application, you are likely to find an event queue. This is an indication that the solution is based on an event-based architecture. Before you embark on any transformation project to break up your monolith in favor of an event-based architecture, you should understand what that means and the challenges you will likely face.

Martin Hodges
14 min readApr 7, 2024
Event based architecture

Monolithic architecture

Before modern software architecture evolved, applications were built as a single deployment, called a monolith.

Monolithic applications tend to evolve from a simple core and get progressively more complex. This introduces technical debt which eventually becomes more and more expensive to extend and maintain.

You may find yourself in this position and you may be considering moving away from a monolithic structure to a distributed, microservice and event-based architecture.

Monolithic architecture

Although there are advantages with monolithic architectures, the tight coupling between your features within your monolith eventually hinder throughput performance and/or the creation and maintenance of features.

If your application needs to scale in terms of throughput or in terms of scope, you may need a different architecture that breaks the tight feature dependencies that prevent both of these things.

You may need an event-based architecture.

Event-based architecture

In an event-based architecture, the key piece of information that underpins the system is the event.

An event is a point in time that marks the fact that something happened. Whilst there can be data associated with the event to describe what happened (the event metadata), the event itself is what is important.

As an historical fact, events are immutable. They are a record of what happened and when it happened and that cannot be changed.

So how do events solve our scaling and maintenance problems?

Because events are immutable, they can be processed at any time after being generated. This means that the system that produces the event (the producer) creates it without needing to know what other system (the consumer) will process the event or when. It also does not care how many consumers there are or what they do with the event.

This means that there is total separation between the producer and consumers.

Event-based architecture

From the diagram above, you can see the producer creates the event and adds it to a queue. Its job is then over and it takes no further interest in the event.

Consumer decoupling

Also, in the diagram above, you can see there are 3 consumers that consume the event. Now, when a consumer consumes an event, the event is not deleted*. It remains in the queue for other consumers to also consume. Each consumer gets to consume the events in sequence (more on that later).

*Note that whilst events are not, notionally deleted, resource costs dictate that old events are purged. It is a solution design decision to determine the length of time events remain accessible for.

As the events remain on the queue, any of the consumers can consume them whenever they are ready to.

This means that, not only are producers decoupled from the consumers, the consumers are also decoupled from each other.

Note that an event-based architecture does not stop features from still accessing each other via synchronous APIs where an immediate response is required. If this is done, it creates a strong dependency so the impact on the architecture has to be carefully considered.

Consumer scaling

In addition to having different consumers processing events in different ways, we can scale each consumer itself.

Scaling consumers

Assuming the consumers are stateless, then you can add more copies of a consumer to consume the events. They take it in turns to take the events off the queue for processing and act as if they are a single consumer. The events are then processed in parallel. This can mean events are processed out of order.

Unlike a monolithic structure that has to be scaled for all features, we can now decide to only scale those features that need to be scaled.

Feature expansion

One of the other problems we have in our monolith is that adding or changing features becomes very difficult. There is always the question of whether we will break something, even if we have automated regression testing in place.

Expanding our feature set

Event-based architecture comes to our rescue again.

In the diagram above, we can change, say, Feature 3 knowing that it has no impact on any other feature.

In addition, we have added Feature 5, again, safe in the knowledge that this will not affect any existing features.

So now we have broken our dependencies and allowed parts of our application to be scaled as required. Let’s look at how it works in practice.

Event broker

Before we look at whether you need an event-based architecture, we should first discuss the event broker and its role in the publish/subscribe model.

The event queue I mentioned earlier, is actually a component called an event broker, or just a broker. It is the broker that the producer sends its events too. The broker persists the events to storage and consumers read them from the broker as and when necessary.

Publish — subscribe model

In practice, there can be many producers creating (or publishing) many types of event. It would be inefficient for a consumer to read all of them when it is only interested in one or two specific types.

To only get those it needs to process, each consumer tells the broker which types of event it is interested in. This is known as subscribing to the event types. The broker then only sends events of those types to the consumer.

This is called a publish-subscribe architecture. This is sometimes shortened to pub/sub.

One thing that you need to understand is that the broker maintains a ‘got here’ checkpoint that tells it where the consumer has got to. This is so that when the consumer asks for more events, it can send only the new ones.

This is how it (almost) keeps events in sequence. Failures can disrupt the flow depending on the configuration of the broker.

So, do you need an event-based architecture?

Now we know what a pub/sub event-based architecture is, we can now look at whether you need one.

Why wouldn’t I want one?

It would seem that the event-based architecture is the proverbial ‘silver bullet’ that solves all of our problems. Everything becomes decoupled, everything is scalable and everything is maintainable.

In practice it may not be the silver bullet we believe it to be.

  • Complexity: The addition of the broker into your architecture introduces another component to be configured and maintained.
  • Reliability: The broker also introduces a significant point of failure in your system. If it fails, your entire system may go offline.
  • Hidden dependencies: Whilst it looks like everything is now decoupled with no dependencies, this is just not true. If the producer stops publishing the event in favour of another or if the metadata associated with the event change, you may be forced to change all the other feature components.
  • Data consistency: As different parts of your system are processing events at a different rate, parts of your system may be inconsistent. Only when the events stop does the system become ‘eventually consistent’, which may never actually happen.
  • Asynchronous response: Where part of your system needs a synchronous response (eg: in response to an action that has to be done now), it may need to be blocked, consuming resources for a significant amount of time.
  • Querying data: Not only does data inconsistency give you a problem when querying data across different features but querying across multiple databases also presents a significant challenge.
  • Debugging: When something goes wrong, trying to find out what happened and why is also made more difficult. Whilst it is possible to replay the sequence of events, race conditions can make the attempt to reproduce the issue very difficult.
  • Testing: Testing your system may now require you to stand up all your features together, including the event broker. This could be a significant effort.
  • Anti-patterns: There are several solutions that people build using event-driven architectures that actually undermine the benefits of using it. These are called anti-patterns.
  • Event-sequencing: On the surface it looks like our consumers can just consume the events without having to worry about out of sequence event, lost events and performance. The truth is that you need to consider all of these in your software.
  • Data duplication: Each feature has its view of the world and, should that view become corrupted, it may take a significant data fix effort to correct it across all features
  • Security vulnerability: With all your business data now travelling over a single system, it becomes vulnerable to attack

Most of these challenges are solved by a monolithic architecture. It evolved because it was, relatively, much simpler to develop than a distributed set of microservices based on an event-based architecture. This means it is quicker and cheaper to develop.

For some (such as start-ups), without the need for large scale processing and a relatively simple problem domain, a monolithic architecture may be a better choice.

Why would I ever consider an event-based architecture?

With its long list of challenges, you are probably wondering why you would ever choose an event-based architecture.

Given its wide-spread adoption, there must be some powerful reasons why you would use it — surely?

Put simply, the advantages of scale both in volumes and feature sets are very high. So high, in fact, that they exceed the cost of the challenges.

Most of the challenges can be overcome:

  • Complexity: Use tried and tested technology solutions, such as Kafka.
  • Reliability: Use tried and tested technology solutions that are highly available, such as Kafka.
  • Hidden dependencies: Instil the concept of version control over your event designs and work with backwards compatibility by one or two versions (n-1, n-2).
  • Data consistency: Determine if this is a problem and, where it is, adopt a synchronous solution, such as APIs.
  • Asynchronous response: As above, where this is a problem, adopt a synchronous solution.
  • Querying data: For some purposes, consider a data warehouse that consumes all events and builds a consistent, current state for your system. Where data is required within your features, consider building the data it needs to query in its own database but try to avoid data sprawl.
  • Debugging: Adopt tools designed for monitoring and managing event-based and distributed systems, such as Zipkin and Jaeger.
  • Testing: create a framework that allows you to work within a segment of your overall system.
  • Anti-patterns: Understand the anti-patterns and simply avoid them.
  • Event-sequencing: Build a framework into your microservices that can manage out of sequence and duplicate messages. Make design choices based on your business requirements.
  • Data duplication: Consider mechanisms for resynchronising distributed data stores and develop tooling to carry out data fixes.
  • Security vulnerability: Ensure your broker can implement the required level of access control to all events, including those being produced, those being consumed and those at rest in the persistent store

Don’t get me wrong, each of these is a significant organisational effort but, if your solutions require the benefits of an event-based architecture, then it is worth the investment in time and money.

Monolith migration challenges

Ok, so you have a monolith and you are running into scaling and maintenance problems. You are probably thinking it is worthwhile separating it into microservices and adding an event broker to gain the benefits of an event-based architecture.

If you are at this point, it is worth noting that there are a number of factors that you need to consider with careful planning and design.

1. Distributing your monolith

Rick Page once wrote about ‘hope is not a strategy’.

Similarly breaking your monolith apart and sticking it back together with an event broker and hoping it will work is also not a strategy.

Doing this will just mean you have just gained all the challenges above with few of the benefits of an event-based architecture. You have created a distributed monolith which is hard to scale and much harder to maintain.

2. Synchronicity in an asynchronous world

As you break apart your synchronous monolith, you will naturally find solutions that replicate that synchronicity across an asynchronous event-based architecture. This is a classic anti-pattern.

You will recognise this anti-pattern when you have processes that issue events and then wait for a response before continuing. This effectively couples your microservices to each other and almost immediately removes the benefit of an event-based architecture.

3. Data duplication — everywhere

Within your monolithic world, every feature has access to all data. Whilst your internal design probably segregates your business entities from each other via a service layer, internal APIs give you ubiquitous access.

When you break your monolith into microservices, you can fall in to the trap that everything now needs its own copy of all the data in the system. This now uses your event brokers as a poor-person’s database synchronisation tool. Again, everything becomes tightly coupled.

4. Event design

It is tempting to think that the event queue itself is just an infinite bucket that can manage any number of events. In reality, you need to carefully consider the design of your events to ensure that you can maintain scalability of the event broker itself.

You also need to consider design conventions to avoid an explosion of concepts across your events that make it hard to understand what is happening and to diagnose problems.

When considering the design of your events, try to remember the concept that the producer should not know anything about the consumer. This helps to create a system that is lose and flexible that extracts the maximum benefit from the architecture.

5. Optimise your design

I just wrote that the producer should not know anything about the consumer. If you take this as an immutable policy, you end up with a system that is difficult to manage and develop as business logic for a given business area or domain can get distributed and possibly even duplicated.

In these cases, you may want to consider adding logic to the producer to avoid the duplication and to simplify all your consumers. Remember though, only do this within the domain of the producer.

6. Design for the worst

With everything up and running, you will find that the broker provides your consumers with a non-duplicate, correctly sequenced, continuous set of events.

It is easy to think that this is the case all the time and that your consumers can implement their business logic with this assumption.

Unfortunately, this is a bad assumption.

Under failure conditions, given certain configurations, your broker can lose events, duplicate events and/or deliver them out of order. If any of these things are important to your system, you need to understand the failure modes of your producers, brokers and consumers and configure them to ensure that your set of events meet the requirements of your solution.

You need to design for worst case, not best case.

Planning your migration

Hopefully you have got this far and now understand that you cannot simply wake up one day and start ripping apart your monolith. You need to design your target state, including:

  • Where is event-based architecture going to help (and where it will not)
  • How you will meet each of the challenges mentioned in this article
  • What conventions you need to keep design consistency across your system
  • How your developers and QA engineers will be able to develop and test the solution

Once you have a target state, you then need to consider how you will go from today’s monolith to the target state.

Rewrites rarely go well. They take time and, in that time, requirements can change, stakeholders can lose interest and business benefit is not realised in time.

You should plan for an incremental implementation. There are several methods for doing this:

Strangler Pattern

In this scenario, you gradually replace parts of the system but, using a facade, those outside of your system know no difference.

The aim is to reduce the risk of the migration by doing it a small bit at a time until the whole solution has been migrated.

This can lead to a lot of extra work and can also constrain your event-based solution due to the need to provide synchronous responses. It can also delay business benefit.

Incremental Evolution

This technique only introduces event-based architecture for new features.

As old features are deprecated, they are deleted from the system and eventually you achieve the target state.

This is dependent on the rate of new features being introduced and on the rate of deprecation. This can lead to the problem of system count augmentation whereby you end up with more systems to manage and maintain, eroding the original business case for migration.

Problem area first

Where you have a particularly problematic part of your monolith, you may want to consider rewriting that section, integrating back in to the monolith only where necessary.

Depending on how integrated this feature is into the monolith, this may become a technically complex build.

Select audience first

In some cases, you may decide that there is a segment of your users (eg: new sign ups, lowest tier plans etc) that might not need the full set of functionality and you may be able to provide them with their own system.

This system would be a basic set that meets the needs of this user segment providing a reduced scope for the new build. Exiting users can remain on the monolith.

Based on feedback from the users, you can then decide the order of feature development on the new solution.

Ideally, progression of the monolith is halted and new features are only allowed on the new solution. In reality, as the monolith represents your oldest, most loyal and perhaps your highest paying customers, this may not be entirely possible, leading to system count augmentation.

Others

There are other options for migration that may suit your use case more effectively. The main thing is that you consider how you are going to reach your target state in a way that:

  • Gains and maintains traction across the stakeholders
  • Minimises risk
  • Does not create a ‘rod for your own back’ (ie: does not create something that actually makes life more difficult)
  • Achieves the expected benefits in the timeframe expected

Summary

In this article I have attempted to explain what an event-based architecture is, why you would want one (and why you may not) and what you have to consider when moving from a monolithic to an event-based architecture.

If you do decide to change your solution architecture, you need to ensure that your target architecture reflects your needs in terms of scalability, maintainability, reliability, separation of concerns and cost.

The migration is not a project to be done ‘on the side’ or ‘in your own time’. It is a serious undertaking that requires careful planning, design and execution.

It can be done. Many people have gone on this journey and successfully unlocked the benefits of an event-based architecture.

If you found this article of interest, please give me a clap as that helps me identify what people find useful and what future articles I should write. If you have any suggestions, please add them in the comments section.

--

--