Practical Implementation Traits of Service Choreography Based Microservices Integration

This is the second post in a three part series looking at the topic of microservice integration. In the first installment, I focused mainly on the theory side of event-driven service choreography. In this second part, I’ll dig into the practical traits that we’ll require of our technical implementations that enable us to satisfy the theory discussed in the first post. In the final installment of the series, I’ll look at different specific implementation techniques/technologies and how they map to the traits discussed in this post.

Implementation traits

I’d like to provide some coverage on what I believe to be the key traits we look for in service choreography based microservices integration implementation techniques/technologies. My goal is to set the scene to make it easier to test specific technologies against those traits (subject of the third and final post in this series).

Must-have traits

Decoupled in time

This one is pretty straightforward. In the first installment of this series, I discussed the asynchronous nature of service choreography based integration. Whatever implementation direction we go in, it needs to support us decoupling services in time. This means, for example, service A does not require service B to be online at a specific point in time (now) — we just need to ensure we have some mechanism in place for events from service B to eventually reach service A at some point in the future.

Guaranteed at-least-once-delivery

An absolute pre-requisite for ensuring eventual consistency is that we guarantee events eventually reach their interested consumers. But, why don’t we aim for exactly-once-delivery instead? I’m not going to repeat what many others have said before me, so suffice to say it’s simply not possible to achieve it in a distributed system. Google is your friend if you want to explore why :–)

doInDatabaseTransaction { statement => 
statement.insert("INSERT into ....")
}
messageQueue.publish(new SomeEvent(...))
  1. Our application is restarted immediately after the database transaction commits
  2. The message queue infrastructure is down for a few minutes, meaning we can’t send the event right now
doInDatabaseTransaction { statement => 
statement.insert("INSERT into ....")
messageQueue.publish(new SomeEvent(...))
}

Guaranteed message ordering

The need for a stream of events to be consumable in order really depends on the use cases of its consumers. Very broadly speaking, there are two categories of consumer use case:

  1. Consumers that are conceptually stateless in that they can treat every event they encounter as if it’s completely unrelated to any other event they’ve encountered in the past. Such consumers will typically trigger some kind of one off action, such as sending an email, sending a push notification, or triggering an external API call. An example of this might be where the reaction to an event requires charging a credit card via a third-party payment gateway.

Guaranteed at-least-once-processing

Well, I guess what we really want is exactly-once-processing! However, I thought it would be helpful to write a separate subsection on idempotency (see below). I find it useful to separate the general action of processing from the outcome of the processing — even if we handle a message/event idempotently (e.g. through some method of deduplication), I still like to consider that the message/event has been processed, despite the absence of any side effects. I find it simpler to think of processing as meaning a consumer has handled a message/event and is now ready to handle the next one in the stream.

Idempotency

Given the inability to achieve exactly-once-delivery, and instead falling back to at-least-once-delivery, we can’t just ignore the fact that consumers will, on occasion, encounter the same event more than once. Idempotency is a property of an event handler that allows the same event to be applied multiple times without any new side effects beyond the initial application. In some cases, it might be ok to live with repeated side effects, and in some cases it won’t be ok. For example, we might not mind if we send a duplicate email, but a customer won’t be too happy if we charge their credit card twice for the same order.

Nice-to-have traits

Consumer-side failure recovery

I was very close to including this trait within the must-haves group, but decided to be lenient. Now that we understand autonomy to be a key attribute for reactive microservices, it follows, in my opinion, that consumers must be responsible for recovering from their own failures without burdening upstream sources of events. I’ve worked with message oriented systems where a producer of events is relied upon to re-dispatch messages in the event a downstream consumer has got itself in a mess. It strikes me that such an approach is not compliant with the autonomy objective — if a consumer is dependent on a producer going beyond its operational responsibilities to help it recover from failure, the autonomy of that consumer is called in to question.

Decoupled in space

In the must-haves section, I covered the trait of integration being decoupled in time. Going a stage further, you can aim for services to be decoupled in space as well. Anyone who has worked with a service-oriented architecture, especially where synchronous integration between services is the norm, will be familiar with the challenge of service addressability. Dealing with the overhead of managing configuration for many service endpoints can be quite a burden.

Parallelism

In an ideal world, we’d want the ability to parallelise the processing capabilities of a specific consumer by starting multiple instances. A common pattern when using messaging middleware is to have a single queue with multiple consumers each being sent messages in a round-robin fashion, with no consumer receiving the same message. This approach works fine in scenarios where processing messages in order is not important. However, as discussed in the must-haves section, we’ll often encounter the need for a consumer to process a stream of events strictly in order, especially when applying service choreography based integration. As also discussed earlier, it’s rarely the case that a consumer cares to receive every event in order, more likely it’s important that events that are related in some way to each other are processed from a stream in the order they were generated (e.g. events emanating from a specific aggregate instance). With this in mind, it’s a nice-to-have to find a way to parallelise consumers, whilst still ensuring events related to each other are processed in order. By doing this we get these primary benefits:

  1. It’s easier to implement high availability of consumers rather than have single points of failure.
  2. We can avoid a consumer use case being completely blocked when encountering a repeated error in processing a single event. If we’re able to parallelise in some way, we can at least have that consumer use case continue processing some events (as long as they aren’t related to the stubborn one) rather than stopping processing altogether.

Wrapping up

Phew, that’s the end of a long post! I’ve covered both the must-have traits and the nice-to-have traits of microservices integration implementations that are supportive of service choreography. In the third and final post of this series, I’ll at last get round to looking at specific technologies and techniques that enable us to satisfy these traits. Stay tuned!

Software engineering nut. Cyclist. Musician. Dog lover

Software engineering nut. Cyclist. Musician. Dog lover