Microservices, Events and CQRS at Klarity

8 min readNov 6, 2017

In a previous blog post I wrote about microservices and a little bit about the challenges and benefits you can expect when going for a microservice based system. Today I’m going to try to explain why we at Klarity chose to go micro, how we use things like event sourcing and command query responsibility segregation, and why we sometimes regret it.

What we do and what we might do

Klarity is a non profit organization that wants to help citizens reduce petty corruption in their local community. And we want to do it with technology. Sounds a bit vague? It is, even though we of course have a more clear vision than that.

What we know is that we want to provide a service where people can safely share their story and experience with petty corruption. They will typically do this using their cell phone and the story they tell will become public on our site instantly. We don’t want to be a gossip and slander site so users of our service will need to backup their claim with some kind of evidence, e.g. a video of the incident. We want to grow an online community of engaged citizens, activists, journalists etc that will pick up on the stories and take actions that will increase the likelihood of petty corruption actually decreasing. This can be done in a number of ways, among those:

Enriching the story with more information (location, names of people and organization involved etc).
Spreading the word by sharing the story on social media.
Separating good content (clearly showing some wrong doing) from bad content (perhaps violating our ToS).

This list can and will grow very long and is something that we will iteratively experiment with. We have a lot of ideas, but the honest truth is that we don’t yet know how people can decrease corruption with our help. We can’t hide from that, instead we need to embrace the fact that everything we build will likely change, be replaced or discarded.

So how do you design a system when the only thing you really know is that you know very little? Well there is no single correct answer to this. We could have built it as a monolith, focusing on reducing time to market as much as possible. We have the luxury of having enough funding to allow us to be a bit more long term minded, so we chose a different path. We are building the system as a set of loosely coupled components, each having a single responsibility. We know that many of these components will be thrown away so we want the coupling between them to be as loose as possible. Enter microservices, events and CQRS.

Event Sourcing

Most systems and applications needs to keep track of some kind of state (e.g. an order with order lines, billing and shipping information, payment information etc). Often this is accomplished by having a couple of entities that over time are updated with the latest information. This way you always have the current state stored. But what about history? If you are lucky there is also some kind of transaction log that you can dig into when you need to debug something.

Event sourcing flips this around and makes the history the source of truth. So instead of continuously updating an order entity, a billing information entity etc you just store an ordered log of events: ItemAddedToCartEvent -> BillingInformationAddedEvent -> … -> OrderPlacedEvent -> ShippingInformationChangedEvent -> OrderShippedEvent -> …

You get a transaction log for free, and you can get the current state by replaying all the events for a given order. Add to that, now you can also get the order state at any given point in time (just stop the replay at this time).

Event Collaboration

So, you no longer store current state, instead you have a bunch of events. In a complex system, making these events available to others is probably something that you will benefit from. Maybe your Fraud Detection Service wants to know when the shipping information for an order changes, your Tracking Service and your Billing Service wants to know when an order is shipped. Etcetera.

One way of accomplishing this would be to have your Order Management Service synchronously make requests to all these services when something interesting happens. But that introduces tight coupling, as soon as a Service A wants information about something happening to an order the Order Management Service has to be updated. Service A and Order Management Service might be owned by different teams, and now you need to coordinate things. You also add complexity to the Order Management Service. What if Service A is unavailable when something interesting happens?

Another way could be to use a shared database, so any interested service can just keep reading from the Order database in a polling fashion. Again, you introduce tight coupling, only this time it is kind of hidden making it even worse. What if the team owning the Order Management Service decides they want to start using MongoDB instead of MySQL? Good luck if there are 10 services relying on querying that MySQL instance.

Worry not, there is a better way, and it is called event collaboration. With event collaboration, the Order Management Service would publish events as they occur. You typically use a message bus (e.g. Kafka, RabbitMQ, Google Cloud PubSub, AWS SNS) and publish the event to a topic. Anyone can subscribe to this topic and will be notified as soon as something is published. This way, when Service A wants to receive Order events, it can just start subscribing to the topic. The Order Management Service does not need to be changed in any way. Yeay, loose coupling and no coordination needed!

What about reads?

Ok, all this sounds great but what about reads? Do I always need to replay all events to read something? What if I want to execute queries that would need to read all events for all orders to return what I want, say “Give me all the orders that have been shipped but not paid” or “Give me all the orders for customer X”? Enter CQRS.

Command Query Responsibility Segregation

An added benefit of event sourcing and event collaboration is that you can now use these events, which are the master data, and build any number of projections of the current truth. So instead of trying to shoehorn a query into an existing data model, why not just build a new model based on the existing events and continuously update it as you receive new events? That’s pretty much the essence of CQRS.

You have one component or service that owns an aggregate, say an order in the order management context (now I’m introducing a couple of terms from Domain-Driven Design, diving deeper into that is beyond the scope of this post). This component implements all the business logic for the aggregate and reacts on commands issued from someone/something. It decides whether or not this command is allowed, and if it is it executes it and emits an event. These kind of services are Command Services, only handling commands/writes.

Another type of service is the Query Service. Such a service only handles queries and its’ data model is a projection of the master data: the events published by a command service. Since a query service does not contain master data it can easily be replaced or thrown away, and the data model and storage method can be very specialized to the query it should support. You could use MongoDB and store an entire order in single a document, you could use MySQL and a single table mapping customer id to order id, you could use Elasticsearch to provide free text search. Anything you want really. And you can have any number of query services subscribing to events for the same aggregate type.

Events and CQRS at Klarity

This isn’t exactly how we built discussions but not far from it. As you can see there is an event sourced command service, a query service building and serving projections of those events and a couple of other services that subscribes to the events and take some sort of action. With event sourcing and event collaboration we can easily add more subscribers when we need to.

Ok, so we use all of the techniques I’ve described. At the time of writing we have around 20 services running. We have a bunch of query services for members, videos, discussions and cases (that’s what we internally call the stories I mentioned, we haven’t really nailed a ubiquitous language yet), we have some services that just subscribes to events and emits new ones. And we have a few command services, each event sourced and handling a given business capability in the system.

This is how our home built monitoring service displays the system at the moment:

To be honest our system doesn’t do that much yet and all these services might seem like overkill. They do to us at times as well, that’s for sure. It can be annoying that adding a property to a case will pretty much always involve touching more than one service. It can be frustrating to set up a new service just for “this one little thing”. But given the premise that we know that we must build for change I think we are heading in the right direction. Our services are loosely coupled, our query services can easily be thrown away or replaced, the services have one and only one reason to live and are easy to reason about. We can add new query services (I can see lots of them coming up, e.g. popular-cases and cases-recommended-to-you) without touching any of the command services.

I’m sure we’ll experiment a lot with how granular (and how many) our query services will be. I think the key here is to be pragmatic, you don’t necessarily need to go all in. Build something that works now, keeping in mind that it might be replaced or thrown away later.

As with any distributed system you need to spend time with things like automated pipelines that lets you go from pushing code to it being out in production, with monitoring, logging etc. We have done a fair bit with that and are in a good enough state for what we need right now, but we still have a long way to go before we’re really happy. Who knows, we might write another blog post about that in the future.